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UrPEODOCTIC® 

The  theory  of  controllability  and  observability  has  been 
developed,  one  might  almost  say  reluctantly,  in  response  to  problems 
generated  by  technological  science,  especially  in  areas  related  to 
control,  communication,  and  computers.  It  seems  that  the  first 
conscious  steps  to  formalize  these  matters  as  a  separate  area  of 
(system- theoretic  or  mathematical)  research  were  undertaken  only  as 
late  as  1959#  ty  KAIMAH  (L960b-c).  There  have  been,  however,  many 
scattered  results  before  this  time  (see  Section  12  for  some  historical 
comments  and  references),  and  one  might  confidently  assert  today  that 
some  o t  the  main  results  have  been  discovered,  more  or  less  independ¬ 
ently,  in  every  country  which  has  reached  an  advanced  stage  of 
"development"  and  it  is  certain  that  these  same  results  will  be 
rediscovered  again  in  still  more  places  as  other  countries  progress 
on  the  road  to  development. 

With  the  perspective  afforded  by  ten  years  of  happenings  in 
this  field,  we  ought  not  hesitate  to  ir^ie  some  guesses  of  the  signi¬ 
ficance  of  what  has  been  accomplished.  I  see  two  main  trends: 

(i)  The  use  of  the  concepts  of  controllability  and  observability 
to  study  none las si cal  questions  in  optimal  control  and  optimal  estima¬ 
tion  theory,  sometimes  as  basic  hypotheses  securing  existence,  more 
often  as  seemingly  technical  conditions  which  allow  a  sharper  statement 
of  results  or  shorter  proofs. 

(ii)  Interaction  between  the  concepts  of  controllability  and 
observability  and  the  study  of  structure  of  dynamical  systems,  such 
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as:  formulation  and  solution  of  the  problem  of  realisation, 
canonical  forms,  decomposition  of  systems. 

The  first  of  these  topics  Is  older  and  has  been  studied 
primarily  from  the  point  of  viev  of  analysis,  although  the  basic 
lema  (2*7  )  is  purely  algebraic.  The  second  group  of  topics 
may  be  viewed  as  "blowing  up"  the  Ideas  inherent  In  the  basic 
lemma  (2. 7  ),  resulting  in  a  more  and  more  strictly  algebraic  point 
of  view. 

There  is  active  research  .in  both  areas. 

In  the  first,  attention  has  shifted  from  the  case  of  systems 
governed  by  finite-dimensional  linear  differential  equations  with 
constant  coefficients  (where  success  was  quick  and  total)  to  systems 
governed  by  Infinite-dimensional  linear  differential  equations  (delay 
differential  equations,  classical  types  of  partial  differential 
equations,  etc.),  to  finite-dimensional  linear  differential  equa¬ 
tions  with  time -dependent  coefficients,  and  finally  to  all  sorts 
and  subsorts  of  nonlinear  differential  equations.  The  first  two 
topics  are  surveyed  concurrently  by  WEIt'S  [19^9]  while  MPftfCUS  [1965] 
locks  at  the  nonlinear  situation. 

own  current  interest  lies  in  the  second  stream,  and  these 
lectures  will  deal  primarily  with  it,  after  a  rather  hurried  over¬ 
view  of  the  general  problem  and  of  the  "classical"  results. 

Let  us  take  a  quick  look  at  the  most  important  of  these  "classical" 
results.  For  convenience  I  shall  describe  them  in  system- theoretic 
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(rather  than  conventional  pure  mathematical)  language.  The  mathe¬ 
matically  trained  reader  should  have  no  difficulty  in  converting 
them  into  his  preferred  framework,  hy  digging  a  little  into  the 
references. 

In  area  (i),  the  most  important  results  are  probably  those 
which  give  more  or  less  explicit  and  computable  results  for  control¬ 
lability  and  observability  of  certain  specific  classes  of  systems. 
Beyond  these,  there  seem  to  be  two  main  theorems: 

THEOREM  A.  A  real,  continuous-time,  n-dimensional,  constant, 
linear  dynamical  system  L  has  the  property  "every  set  of  n 
eigenvalues  may  be  produced  by  suitable  state  feedback”  if  and 
only  if  L  is  completely  controllable. 

The  central  special  case  is  treated  in  great  detail  by  KAIMAN, 
FALB,  and  ARBIB  [1969*  Chapter  2.  Theorem  5-10];  for  a  proof  of  the 
general  case  with  background  comments,  refer  to  WONHAM  [19673.  As 
a  particular  case,  we  have  that  every  system  satisfying  the  hypotheses 
of  the  theorem  can  be  "stabilized"  (made  to  have  eigenvalues  with 
negative  real  parts)  via  a  suitable  choice  of  feedback.  This  result 
is  the  "existence  theorem"  for  algorithms  used  to  construct  control 
systems  for  the  past  three  decades,  and  yet  a  conscious  formulation 
of  the  problem  and  its  mathematical  solution  go  back  to  about  1963* 
(See  Theorem  D  below.)  The  analogous  problem  for  nonconstanc  linear 
systems  (governed' by  linear  difl  ;rc i-'.tial  equations  with  variable 
coefficients)  is  still  not  solve’.. 
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THEOREM  B.  ("duality  Principle")  Every  problem  of  control- 
lability  in  a  real,  (continuous -time,  or  discrete -time),  finite¬ 
dimensional,  constant,  linear  dynamical  system  is  equivalent  to 
a  controllability  problem  in  a  dual  system 

This  fact  was  first  observed  by  KALMAN  [1960a]  in  the  solution 
of  the  optimal  stochastic  filtering  problem  for  discrete -time 
systems,  and  was  soon  applied  to  several  problems  in  system  theory  by 
XAIMAN  [1960b-c].  See  also  many  related  comments  by  KAIMAN,  FALB, 
and  ARBIB  [Chapters  2  and  6,  1969].  As  a  theorem,  this  principle 
is  not  yet  known  to  be  valid  outside  the  linear  area,  but  as  an 
intuitive  prescription  it  has  been  rather  useful  in  guiding  system- 
theoretic  research.  The  problems  involved  here  are  those  of  fomula- 
tion  rather  than  proof.  The  basic  difficulties  seem  to  point  toward 
algebra  and  in  particular  category  theory.  System- theoretic 
duality,  like  the  categoric  one,  is  concerned  with  "reversing 
arrows".  See  Section  10  for  a  modern  discussion  of  these  points 
and  a  precise  version  of  Theorem  B. 

Partly  as  a  result  of  the  questions  raised  by  Theorem  B  and 
partly  because  of  the  algebraic  techniques  needed  to  prove  Theorem 
A  and  related  lemmas,  attention  in  the  early  1960*3  shifted  toward 
certain  problems  of  a  structural  nature  which  were,  somewhat  sur¬ 
prisingly  at  first,  found  to  be  related  to  controllability  and 
observability.  The  main  theorems  again  seem  to  be  two: 

THEOREM  C.  (Canonical  Decomposition)  Every  real  (continuous¬ 
time  or  discrete-time),  finite -dimensional,  constant,  linear  dynamical 


that  -which  is  completely  controllable  and  completely  observ¬ 
able^  is  involved  in  the  input/output  behavior  of  the  system. 

The  proof  given  hy  KALMAN  [1962]  applies  to  nonconstant  systems 
only  under  the  severe  restriction  that  the  dimensions  of  the  sub- 
space  of  all  controllable  and  all  unobservable  states  is  constant 
on  the  whole  real  line.  The  result  represented  by  Theorem  C  is  far  from 
definitive,  however,  since  finite -dimensional  linear,  rcnconstant  systems 
admit  at  least  four  different  canonical  decompositions :  it  is 
possible  and  fruitful  to  dualize  the  notions  of  controllability 
and  observability,  thereby  arriving  at  four  properties,  presently- 
called 

reachability  and  controllability 

as  well  as 

constructibility*  and  observability. 

(See  Section  2  definitions.)  Any  combination  of  a  property  from 
the  first  list  with  a  property  from  the  second  list  gives  a  canoni¬ 
cal  decomposition  result  analogous  bo  Theorem  C.  The  complexity  of 
the  situation  was  first  revealed  by  YfEISS  and  KALMAN  [19^5];  this 
paper  contributed  to  a  revival  of  interest  (with  hopes  of  success) 

|n  the  special  problems  of  none on stant  linear  systems.  Recent 


*WEISS  [1969]  uses  "detenainability1’  instead  of  constructi¬ 
bility.  The  new  terminology  used  in  these  lectures  is  not  yet 
entirely  standard. 
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progress  is  surveyed  by  WEISS  [19693*  Intimately  related  to  the 
canonical  structure  theorem,  and  in  fact  necessary  to  fully  clarify 
the  phrase  "involved  in  the  input/output  behavior  of  the  system",  is 
the  last  basic  result: 

THEOREM  D.  (Uniqueness  of  Minimal  Realization)  Given  the 
impulse-response  matrix  W  of  a  real,  continuous- time,  flni'te- 
dlmensional,  linear  dynamical  system,  there  exists  a  real, '  continuous- 
time,  finite -dimensional,  linear  dynamical  system  2^  which 

(a)  realizes  W:  that  is,  the  impulse-response  matrix  of 
'  is  equal  to  W; 

(b)  has  minimal  dimension  in  the  class  of  linear  systems 
satisfying  (a); 

(c)  is  completely  controllable  and  completely  observable; 

,  (d)  is  uniquely  determined  (modulo  the  choice  of  a  basis 

at  each  t  for  its  state  space)  by  requirement  (a) 
together  with  (b)  or,  Independently,  by  (a)  together  with 
(c). 

In  short,  for  any  W  as  described  above,  there  is  an  "essentially 
unique"  2^  of  the  same  "type"  which  satisfies  (a)  through  (c). 

COROLLARY  1.  If  W  comes  from  a  constant  system,  there  is  a 
constant  2^  which  satisfies  (a)  through  (c),  and  is  uniquely 
determined  by  (a)  +  (b)  or  (a)  +  (c)  (modulo  a  fixed  choice  of 
basis  for  its  state  space). 


11 


R.E.  Kalman 

COROLLARY  2.  All  claims  of  Corollary  1  continue  to  hold  if 
"impulse-response  matrix  of  a  constant,  finite-dimensional  system” 
is  replaced  by  "transfer  function  matrix  of  a  constant,  finite- 
dimenkional  system". 

The  first  general  discussion  of  the  situation  with  an  equiva¬ 
lent  statement  of  Theorem  D  is  due  to  KAIMAtf  [1963b,  Theorems  7 
and  8].  (This  reaper  does  not  include  complete  proofs,  or  even 
an  explicit  statement  of  Corollaries  1  and  2,  although  they  are 
implied  by  the  general  algorithm  given  in  Section  7*  An  edited 
version  of  the  original  unpublished  proof  of  Theorem  D  is  given 
in  KALMAN,  FALB,  and  ARBIB  [1969,  Chapter  10,  Appendix  Cj.) 

These  results  are  of  great  importance  in  engineering  system 
theory  since  they  relate  methods  based  on  the  Laplace  transform 
(using  the  transfer  function  of  the  system)  and  the  time- domain 
methods  based  on  input/output  data  (the  matrix  W)  to  the  state- 
variable  (dynamical  system)  methods  developed  in  1955-1960.  In 
fact,  by  Corollary  1  it  follows  that  the  two  methods  wust  yield 
identical  results;  for  instance,  starting  with  a  constant  impulse- 
response  matrix  W,  property  (c)  implies  that  the  existence 
of  a  stable  control  lay  is  always  assured  by  virtue  of  Theorem  A. 
Thus  it  is  only  after  the  development  represented  by  Theorems  A-D 
that  a  rigorous  justification  is  obtained  for  the  intuitive  design 
methods  used  in  control  engineering. 

As  with  Theorem  C,  certain  formulation&l  difficulties  arise 
in  connection  with  a  precise  definition  of  &  "nonconstant  linear 
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dynamical  system".  Thus,  it  seems  preferable  at  present  to  replace 
in  Theorem  D  '’inpulse- response  satrix  W"  by  "weighting  pattern  W" 

(or  "abstract  input/output  nap  W")  and  "complete  controllability” 
by  "complete  reachability**  The  definitive  fora  of  the  1963  theorea 
evolved  through  the  works  of  WKISS  and  KAIMAH  [19®  1#  YOUIA  [1966], 
and  XAIMAH;  a  precise  formulation  and  modernized  proof  of  Theorem  D 
In  the  weighting  rnttern  case  was  given  recently  by  KAIMAN,  FAI3, 
and  AEBIB  [19®j  Chapter  10/  Section  13.]  A  completely  general 
discussion  oi  what  is  Deant  by  a  "minimal  realization"  of  a  non¬ 
constant  impulse -response  matrix  involves  many  technical  complica¬ 
tions  due  to  the  fact  that  such  a  minimal  realization  does  not 
exist  in  the  class  of  linear  differential  equations  with  "nice” 
coefficient  functions.  For  the  current  status  of  this  problem, 
consult  especially  3553GER  and  VARAIYA.  [1967],  SILVERMAN  and  MEADOWS 
[19®},  KAIMAH,  FAL3,  and  AilBIB  [1969,  Chapter  10,  Section  13  ]  and 
WEISS  [19691. 

From  the  standpoint  of  the  present  lectures,  by  far  the  most 
interesting  consequence  of  Theorem  D  is  its  influence,  via  efforts 
to  arrive  at  a  definitive  proof  of  Corollary  1,  on  the  development 
of  the  algebraic  stream  of  system  theory.  The  first  proof  of  this 
important  result  ^in  the  special  case  of  distinct  eigenvalues)  is 
that  of  GILBERT  [1963]-  Immediately  afterwards,  a  general  proof 
was  given  by  KAIMAN  [1963b,  Section  7 1  •  This  proof,  strictly 
computational  and.  linear  algebraic  in  nature,  yields  no  thecreti- 
cal  insight  although  it  is  useful  as  the  basis  of  a  computer  algorithm. 
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Using  the  cl&sslc&l  theory  of  invariant  factors,  XAIMAN  [l9^5&] 

succeeded  in  shoving  that  the  solution  of  tt-s  minimal  realization 

problem  can  be  effectively  reduced  to  the  classical  invariant - 

factor  algorithm.  This  result  is  of  great  theoretical  interest 
% 

since  it  strongly  suggests  the  now  standard  module  theoretic 
approach,  but  it  does  not  lead  to  a  simple  proof  of  Corollary  1 
and  is  not  a  practical  method  of  computation. 

The  best  known  proof  of  Corollary  1  was  obtained  in  1965  by 
B.  L.  Ho,  with  the  aid  of  a  remarkable  algorithm,  which  is  equally  important 
from  a  theoretical  and  computational  viewpoint.  The  ea*ly  formula¬ 
tion  of  the  algorithm  was  described  by  HO  and  KAIMAN  [1966],  with 
later  refinements  discussed  in  HO  and  KAIMAN  [1969],  KALMAN,  FALB, 
and  APJBIB  [1969,  Chapter  10,  Section  11]  and  KALMAN  E 1969c 3. 

Almost  simultaneously  with  the  work  of  B.  L.  Ho,  the  basic  results 
were  discovered  independently  also  by  YOULA  and  TISSI  [1966]  and 
by  SILVERMAN  [  1966] .  The  subject  goes  back  to  the  19th  century 
and  centers  around  the  theory  of  Hankel  matrices;  however,  many 
of  the  results  just  referenced  seem  to  be  fundamentally  new.  This 
field  is  currently  in  a  very  active  stage  of  development.  We  shall 
discuss  the  essential  ideas  involved  in  Sections  8-9*  Many  other 
topics,  especially  Silvermans  generalization  of  the  algorithm  to 
nonconstant  systems  unfortunately  cannot  be  covered  duf.  to  lack  of 


time. 
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s,  etc.)  usually  connotes  the  action 


dynamics,  flows,  abstract  dynamic 
of  a  one-parameter  group  T  (the  reals)  on  a  set  X,  where  X  is 
at  least  a  topological  space  (more  often,  a  differentiable  manifold) 
and  the  action  is  at  least  continuous.  This  setup  is  physically 
motivated,  but  in  a  veiy  old-fashioned  sense.  A  "dynamical  system" 
as  just  defined  is  an  idealization,  generalization,  and  abstraction 
of  Newton* s  world  view  of  the  Solar  System  as  described  via  a  finite  set  of 
nonlinear  ordinary  differential  equations.  These  equations  represent 
the  positions  and  momenta  of  the  planets  regarded  as  point  masses  and 
are  completely  determined  by  the  laws  of  gravitation,  i.e.,  they  do 
not  contain  any  terms  to  account  for  "external”  forces  that  may  act 
on  the  system. 

Interesting  as  this  notation  of  a  dynamical  system  may  be  (and 
isl)  in  pure  mathematics,  it  is  much  too  limited  for  the  study  of 
those  dynamical  systems  which  are  of  contemporary  intere;  t.  There 
are  at  least  three  different  ways  in  which  the  classical  concept 
must  be  generalized: 

(i)  The  time  set  of  the  system  is  not  necessarily  restricted 
to  the  reals; 

*  (ii)  A  state  x  €  X  of  the  system  is  not  merely  acted  upon  by 
the  "passage  of  time”  but  also  by  inputs  which  are  or  could  be  mani¬ 
pulated  to  bring  about  a  desired  type  of  behavior; 
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(iii)  The  states  of  the  system  cannot,  in  general,  be  observed. 
Rather,  the  physical  behavior  of  the  system  is  manifested  through 
its  outputs  which  are  many-to-one  functions  of  the  state. 

The  generalization  of  the  time  set  is  of  minor  interest  to  us 
here.  The  notions  of  input  and  output,  however,  are  exceedingly 
fundamental;  in  fact,  controllability  is  related  to  the  input  and 
observability  to  the  output.  With  respect  to  dynamical  systems  in 
the  classical  sense,  neither  controllability  nor  observability  are 
meaningful  concepts. 

A  much  more  detailed  discussion  of  dynamical  systems  in  the  modern 
sense,  together  with  rather  detailed  precise  definitions,  will  be 
found  in  KAI/4AN,  FALB,  and  ARBIB  [1969,  Chapter  l]. 


From  here  on,  we  will  use  the  term  "dynamical  system”  exclusively 
in  the  modern  sense  (we  have  already  done  so  in  the  Introduction) . 

The  following  symbols  will  have  a  fixed  meaning  throughout  the 
paper: 


T  *5  time  set, 

U  =  set  of  input  values, 

X  *  state  set, 

(l.l)  {  Y  =  set  of  output  values, 

A  »  input  functions, 

*=  transition  map, 

^  tj  =  readout  map. 

The  following  assumptions  will  always  apply  (otherwise  the  sets 
above  are  arbitrary) : 
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T  *  an  ordered  subset  of  the  reals  R, 
n  *=  class  of  functions  T  -»  U  such  that 

(i)  each  function  u>  is  undefined  outside  some 
(1.2)  finite  interval  J^CT  dependent  on  a>; 

(ii)  if  J  B  there  is  a  function 

a)  £  n  which  agrees  with  o>  on  and 
v  with  to'  on  J^,. 

For  most  purposes  later,  T  will  be  equal  to  Z  «  (ordered) 
abelian  group  of  integers;  U,  X,  Y,  Q  will  be  linear  spaces;  "unde¬ 
fined"  can  be  replaced  by  "equal  to  0”;  and  "functions  undefined  out¬ 
side  a  finite  .nterval"  will  mean  the  same  as  "finite  sequences". 

The  most  general  notion  of  a  dynamical  system  for  our  present 
needs  is  given  by  the  following 


(1.3)  DEFINITION. 


lamical  system  £  is  a  composite  ob.lect 


cons! sting  of  the  mans  <p,  ^  defined  on  the  sets  T,  U,  Q,  X,  Y 
(as  above) : 


<p:  T  X  T  X  X  X  n  ->  X, 

:  (t ;  t,  x,  o)  u>  cp(t ;  t,  x,  cn) 
undefined  whenever  t  >  t; 


V  T  X  X  -»  Y:  (t,  x)  h>  q(t,  x) 


The  transition  map  (p  satisfied  the  following  assumptions; 


(1.4)  <p(t;  t,  x,  w) 


*=  x; 
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(1.5)  <p{t}  t,  x,  a>)  «  <p(t;  s,  9(s;  t,  x,  u>),  u>); 

(1.6)  if  a)  =  o>*  on  It,  t),  then  for  all  s  €  [t,  t) 

<Ks;  T,  x,  (D)  «=  <p(s;  t,  x,  CD1). 

The  definition  of  a  dynamical  system  on  this  level  of  generality 
should  he  regarded  only  as  a  scaffolding  for  the  terminology;  interest- 
ing  mathematics  begins  only  after  further  hypotheses  are  made.  For 
instance,  it  is  usually  necessary  to  endow  the  sets  T,  U,  ft,  X,  and 
Y  with  a  topology  and  then  require  that  9  and  R  he  continuous. 

(1.7)  EXAMF!£.  The  classical  setup  in  topological  dynamics  may 
be  deduced  from  our  Definition  (1.3)  in  the  following  way.  Let 

T  «  H  =  reals,  regarded  as  an  abelian  group  under  the  usual  addition 
and  having  the  usual  topology;  let  ft  consist  only  of  the  nowhere- 
defined  function;  let  X  be  topological  space;  disregard  Y  and  r\  entirely; 
define  9  for  all  t,  r  €  T  and  write  it  as 

9(t;  t,  x,  o>)  «  x* ( t  -  t), 

that  is,  a  function  of  x  and  t  -  t  alone.  Check  (1.4-5);  in 
the  new  notation  they  become 

x*0  «=  x  and  x*(s  +  t)  =  (x*s)*t. 

Finally,  require  that  the  map  (x,  t)  t-»  x*t  be  continuous. 

(1.8)  INTERPRETATION.  The  essential  idea  of  Definition- (l. 3)  is 

that  it  axiomatizes  the  notion  of  state.  A  dynamical  system  is  informally 
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a  rule  for  state  transitions  (the  function  9),  together  with  suitable 
means  of  expressing  the  effect  of  the  input  on  the  state  and  the  effect 
of  the  state  on  the  output  (the  function  q)  .  The  map  9  is  verbalized 
as  follows:  "an  input  o>,  applied  to  the  system  Z  in  state  x  at 
time  t  produces  the  state  qp(t;  x,  x,  o>)  at  time  t."  The  peculiar 
definition  of  an  input  function  m  is  used  here  mainly  for  technical 
convenience;  by  (1.6)  only  equivalence  classes  of  inputs  agreeing  over 
[x,  t]  enter  into  the  determination  of  9(t;  x,  x,  cd).  ’to  not  defined" 
at  t  means  no  input  acts  on  Z  at  time  t. 

The  pair  (x,  x)  €  T  X  X  will  be  called  an  event  of  a  dynamical 

system  £. 

In  the  sequel,  we  shall  be  concerned  primarily  with  systems  which 
are  finite-dimensional,  linear,  and  continuous-time  or  discrete- time. 
Often  these  systems  will  be  also  real  and  constant  (*=  stationary  or 
t irae- invar iant ) .  We  leave  the  precise  definition  of  these  terms  in 
the  context  of  Definition  (1.3)  to  the  reader  (consult  KAIMAN,  FALB, 
or  ARBIB  [1969*  Chapter  l]  as  needed)  and  proceed  to  make  some  ad  hoc 
definitions  without  detailed  explanation. 

The  following  conventions  will  remain  in  force  throughout  the 
lectures  whenever  the  linear  case  is  discussed: 

(1.9)  Continuous-time .  T  =  R*  U  =  Rm,  X  =  Rn>  Y  = 

ft  =  all  continuous  functions  R  I?1  which  vanish  out¬ 
er  a 

side  a  finite  interval. 

(1.10)  Discrete-time.  T  -  Z,  K  «  fixed  field  (arbitrary), 
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U  *  K?8,  X  «  Xa,  Y  «  k?,  fl  «  all  functions 

2  -*  I?  which  ore  zero  for  all  but  a  finite  number  of 
** 

their  arguments. 


Row  we  have,  finally. 


(i'll)  DEFINITION.  A  real,  continuous- tine,  n- dimensional,  linear 
dynamical  system  £  is  a  triple  cf  continuous  matrix  functions  of 
time  (F(*),  G(*)z  H(0)  where 


F(  *)s 

R.-4 

(n  X  n 

matrices  over 

8> 

0(.)s 

R  -4 

Si 

(n  X  m 

matrices  over 

fit 

H(-)s 

R  -t 
= 

(p  X  n 

• 

matrices  over 

fit 

These  maps .  determine  the  aquations  of  motion  of  £  in  the  following 
manner i 


*  F(t)x  +  G(t)o}(t), 
“  H(t)x(t), 


whore  t  €  R,  x  £  Rn,  o>(£)  €  Rm,  and  y(t)  €  Rp. 


To  check  that  (1.12)  indeed  makes  £  into  a  well-defined  dynamical 
system  in  the  sense  of  Definition  (l.3)>  it  is  necessary  to  recall  the 
basic  facts  about  finite  systems  of  ordinary  linear  differential  equations 
with  continuous  coefficients.  Define  the  map 

4p(t,  t);  ,  R  X  R  ->  (n  X  n  matrices  over  R) 
to  be  the  family  of  n  X  n  matrix  solutions  of  the  linear  differential 
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equation 

dx/dt  ■  F(t)x,  x  €  R 
subject  to  the  initial  condition 

$_( t,  t)  »  I  *  unit  matrix,  t  €  R. 

X  B 

Then  fl_,  is  of  class  C1  in  both  arguments.  It  is  called  the 

X 

transition  matrix  of  (the  system  T. \  whose  "infinitesimal'  transition 
matrix  is)  F(*}.  From  this  standard  result  we  get  easily  also  the 
fact  that  the  transition  map  of  2  is  explicitly  given  by 

(1.13)  *(t;  t,  x}  «>)  =  *r(t,  t)x  +  /*  4>?(t,  s)G(s)G*(s)®».(t,  s)ds 
while  the  readout  map  is  given  by 

(1.14)  fl(t,  x)  =  H(t)x. 

It  Ic  instructive  to  verify  that  <p  indeed  depends  only  on  the  equiva¬ 
lence  class  of  <n»s  which  agree  on  [t,  t]. 

In  view  of  the  classical  terminology  "linear  differential  equa¬ 
tions  with  constant  coefficients",  we  introduce  the  nonstandard 

(1.15)  DEFINITION.  A  real,  continuous-time,  finite-dimensional 

V  ” 

linear  dynamical  system  Z  »  (f(*),  G(*)>  H( •)}  is  called  constant 
iff  all  three  matrix  functions  are  constant. 

In  strict  analogy  with  (1.15) ,  we  say: 

( 1 . 16)  DEFINITION .  A  discrete- time,  finite-dimensional,  linear* 
cutg^uat  dynamical  system  2  over  X  is  a  triple  (F,  G,  H)  of 
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x(t  +  1)  =  Fx(t)  +  Gco(t), 
y(t)  *  Hx(t), 


t  6  Z,  x  £  K*,  <o(t)  6  K“,  and  y(t)  £  KP. 

In  the  sequel,  we  shall  use  the  notations  (F,  G,  -)  0r 

r,  H)  to  denote  systems  possessing  certain  properties  which 
are  true  for  any  H  or  G. 

Finally,  we  adopt  the  following  convention,  which  is  already 
Implicit  in  the  preceding  discussion: 


(1.18)  DEFINITION.  The^ns^n  n  of  a  dynamical  sgsj 
Z  ~ei*ual  t0  the  .  Xz  as  a  vector  apace 
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2.  STANDARDIZATION  OF  DEFINITIONS  AND  "CLASSICAI/'  RESULTS 

In  this  section,  we  shall  be  mainly  interested  in  finite- 
dimensional  linear  dynamical  systems,  although  the  first  two 
definitions  will  be  quite  general. 

■Let  E  be  an  arbitrary  dynamical  system  as  defined  in 
Section  1.  We  assume  the  following  slightly  special  property: 
There  exists  a  state  x*  and  an  input  uj*  such  that 

<p(t;  X,  X*,  <n»)  =  X*  for  all  t,  t  £  T  and  t  >  t. 

For  Simplicity,  we  write  x*  and  <u*  as  0.  (When  X 
and  a  have  additive  structure,  0  will  have  the  usual  mean¬ 
ing.)  The  next  two  definitions  refer  to  dynamical  systems 
with  this  extr^  property. 

(2.1)  DEFINITION.  An  event  (t,  x)  Is  controllable  i f 
there  exists  a  t  €  T  and_an  »£ll  (both  t  and  a,  may  depend 
on  (t,  x))  such  that 

«p(t;  t,  x,  <o?  s  o 

In  words:  an  event  is  controllable  iff  it  can  be  transfers 

to  0  in  finite  time  by  an  appropriate  choice  of  the  input  function 

“•  ^  of  the  P*th  (t,  x)  to  (t,  0)  as  the  graph  of  a 

function  defined  over  (t,  t]. 

The  technical  word  iff  means  if  and  only  if. 
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Consider  new  a  reflection  of  this  graph  about  x.  This  . 
suggests  a  new  definition  which  is  a  kind  of  "adjoint"  of  the 
definition  of  controllability: 

(2.2)  IfeflKTflOH.  An  event  (x,  x)  is  reachable  iff  there 
is  an  s  €  T  and  an  m  €  ft  (both  s  and  m  may  depend  on 
(t*  3t))  such  that 

x  *  0,  ®). 

He  emphasize:  controllability  and  reachability  are  entirely 
different  concepts.  A  striking  example  of  this  fact  is  encountered 
below  in  Proposition  (4.26) , 

We  shall  now  review  briefly  seme  well-known  criteria  for  and 
relations  between  reachability  and  controllability  in  linear  systems. 

(2.3)  IMPOSITION .  In  a  real,  continuous -time,  finite-dimensional, 
linear  dynamical  system  Z  »  (f(*  )j  G(*)>  -  ),  an  event  (x,  x)  is 

(a)  reachable  if  and  only  if  x  €  range  W(s,  x)  for 
sane  s  €  R,  s  <  x,  where 

—  ■  — »  rz  — 

%{B,  x)  «  /T  4>f(x,  (TjGttrjG'fcJ^T,  cr)ctcr 

(h)  controllable  if  an  only  if  x  G  range  W(x,  b)  for 
seme  t  €  R,  t  >  x,  where 

W(x,  t)  =  /*  4>p(x,  sjGtsjG'ts)^!,  s)ds. 

me  original  proof  of  (b)  is  in  KAI24AN  [  1960b  j;  both  cases 
are  treated  in  detail  in  KAIMAN,  FALB,  and  ARBIB  [1969,  Chapter  2, 
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Section  2],  Note  that  if  G(*)  is  identically  zero  on  (-  «,  t) 
ve  cannot  have  reachability,  and  if  G(  •)  is  identically  .. 
zero  on  (t,  *  «)  ve  cannot  have  controllability. 

For  a  constant  system,  the  integrals  above  depend  only  on 
the  difference  of  the  limits;  hence,  'in  particular 

W(t,  t)  »  W(2t  -  w,  t'  . 

So  ve  have 

(2.4)  PROPOSITION.  In  a  real,  continuous-time,  finite-dimensional, 
linear,  constant  dynamical  system  an  event  (t,  x)  is  reachable 

for  all  t  if  and  only  if  it  is  reachable  for  one  t;  an  evsnt 
is  reachable  if  and  only  if  it  is  controllable. 

From  (2.3)  one  can  obtain  in  a  straightforward  fashion  also 
the  following  much  stronger  result: 

(2.5)  THEOREM.  In  a  real^  continuous-time,  n- dimensional, 
linear,  constant  dynamical  system  £  **  (F,  G,  -)  a  state  x 
is  reachable  (or,  equivalently,  controllable)  at  any  t  £  R 
if  and  only  if 

x  €  span  (G,  FG,  ...  )  CRn; 

if  this  condition  is  satisfied,  we  can  c  se  8  =  t  •  5,  t  «  x  +  5, 

with  6  >  0  arbitrary.  (The  span  of  a  <<aenee  of  matrices  is  to 

be  interpreted  as  the  vector  space  generated  by  the  columns  of 
these  matrices.) 
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A  proof  of  (2.5)  may  be  found,  in  KALMArJ,  HO,  and  NARENDRA 
[1963]  and  in  KAIMAJN,  FALB,  and  ARBIB  [1969,  Chapter  2,  Section 
3].  A  trivial  but  noteworthy  consequence  is  the  fact  that  the 
definition  of  reachable  states  of  £  is  "coordinate- free": 

(2.6)  COROLLARY.  The  set  of  reachable  (or  controllable) 
states  of  £  in  Theorem  (2.5)  is  a  subspace  of  the  real  vector 
space  X^.,  the  state  space  of  £ . 

Very  often  the  attention  to  individual  states  is  unnecessary 
and  therefore  many  authors  prefer  to  use  the  terminology  "£  is 
completely  reachable  at  t"  for  "every  event  (t,  x),  t  =  fixed, 
x  €  Xj,  is  reachable",  or  "£  completely  reachable"  for  "every 
event  in  £  is  reachable",  etc.  Thus  (2.5),  together  with  the 
Cayley-Hamilton  theorem,  implies  the 

(2.7)  ■  BASIC  LEMMA.  A  real,  continuous-time,  n- dimensional, 
linear,  constant  dynamical  system  £  =  (F,  G,  -)  is  completely 
reachable  if  an  only  if 

(2.3)  rank  (G,  FG,  Fn"1G)  =  n. 

Condition  (2.8)  is  very  well-known;  it  or  equivalent  forms  of 
it  have  been  discovered,  explicitly  used,  or  implicitly  assumed  by 
many  authors.  A  trivially  equivalent  form  of  (2.7)  is  given  by 

(2.9)  COROLLARY  1.  A  constant  system  £  =  (F,  G,  -)  is 
completely  reachable  if  and  only  if  the  smallest  F- invariant 
sub  space  of  Xy  containing  (all  column  vectors  of)  G  is  X^ 


itself. 
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A  useful  variant  of  the  last  fact  is  given  by 


(2.10)  COROIIARY  2,  (W.  Hahn)  A  constant  system  2  «  (F,  G,  -) 

is  completely  reachable  if  and  only  if  there  is  no  nonzero  eigen¬ 


vector  of  F  which  is  orthogonal  to  (every  column  vector  of)  G. 


Finally,  let  us  note  that,  far  from  being  a  technical  condi¬ 
tion,  (2.5)  has  a  direct  system-theoretic  interpretation,  as 
follows: 

(2.1l)  PROPOSITION,  The  state  space  Xj,  of  a  real,  continuous- 
time,  n- dimensional,  linear,  constant  dynamical  system  I  »  (F,  G,  -) 
may  be  written  as  a  direct  sum 


Xj.  =  x,  &  Xg, 


which  induces  a  decomposition  of  the  equations  of  motion  as  (obvious 
notations) 


(2.12) 


d^/dt  =  F11x1  +  F12x2  +  G1u(t), 

dXg/dt  «  F22x2’ 


The  subsystem  2^  -  (F^,  -)  is  completely  reachable.  Hence 

a  state  x  =  (x^,  x2)  €  X^,  is  reachable  if  and  only  if  x2  =  0. 


PROOF,  We  define  X^  to  be  the  set  of  reachable  states 
of  2j  by  (2.5)  this  is  an  F-invariant  subspace  of  Xr.  Hence,  by 
finite- dimensionality,  X^  is  a  direct  summand  in  X^,.  By  construc¬ 
tion,  every  state  in  X^  is  reachable,  and  (every  column  vector  of) 
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G  ‘belongs  to  X^.  The  F- invariance  of  X^  implies  that 
F11  “  which  implies  the  asserted  form  of  the  equations  of 
motion.  □ 

(2.13)  REMARK.  Note  that  X£  is  not  intrinsically  defined 
(it  depends  on  an  arbitrary  choice  in  completing  the  direct  sum). 

Hence  to  say  that  "(0,  x^)  is  an  unreachable  (or  uncontrollable) 
state  if  Xg  £  0"  is  an  abuse  of  language.  More  precisely:  the 
set  of  all  reachable  (or  controllable)  states  has  the  structure  of 
a  vector  space,  bit  the  set  of  all  unreachable  (or  uncontrollable ) 
states  does  not  have  such  structure.  This  fact  is  important  to 
bear  in  mind  for  the  algebraic  development  which  follows  after 
this  section  and  also  in  the  definition  of  observability  and 
constructibility  below.  In  general,  the  direct  sum  cannot  be 
chosen  in  such  a  way  that  “  0. 

While  condition  (2.8)  has  been  frequently  used  as  a  technical 
requirement  in  the  solution  of  various  optimal  control  problems  in 

the  late  1950*8,  it  was  only  in  1959-60  that  the  relation  between  ) 

(2.8)  and  system  theoretic  questions  was  clarified  by  KALMAN  L1960b-c] 

via  Definition  (2.2)  and  Propositions  (2.5)  and  (2.1l).  (See  Section 

11  for  further  details.)  In  other  words,  without  the  preceding 

discussion  the  use  of  (2.8)  may  appear  to  be  artificial,  but  in  fact 

it  is  not,  at  least  in  problems  in  which  control  enters,  because, 

by  (2.12)  control  problems  stated  for  Xr  sue  nontrivial  oily  with 


respect  to  the  intrinsic  sub space  X^, 
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The  hypothesis  “constant"  is  by  no  means  essential  for 
Proposition  (2.11),  but  we  must  forego  further  comments  here. 

For  later  purposes,  we  state  some  facts  here  for  discrete- 
time,  constant  linear  systems  analogous  to  those  already  developed 
for  their  continuous-time  counterparts.  The  proofs  are  straight¬ 
forward  and  therefore  omitted  (or  given  later,  for  illustrative 
purposes) . 


(2.14)  PROPOSITION.  A  state  x  of  a  real,  discrete-time, 
n-dimensional,  linear,  constant  dynamical  system  £  -  (f,  G,  -) 
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Mote  also  that  Propositions  (2.11)  and  its  proof  continue 
to  be  correct,  without  any  modification,  when  "continuous-time" 
is  replaced  by  "discrete-time". 

Nov  we  turn  to  a  discussion  of  observability. 

The  original  definition  of  observability  by  XAIMAN'  ( 1960b, 
Definition  (5 -23)1  was  concocted  in  such  a  way  as  to  take  advan¬ 
tage  of  vector- space  duality.  The  conceptual  problems  surround¬ 
ing  duality  are  easy  to  handle  in  the  linear  case  but  are  still 
by  no  means  fully  understood  in  the  nonlinear  case  (see  Section 
10).  In  order  to  get  at  the  main  facts  quickly,  we  shall  consider 
here  only  the  linear  case  and  even  then  we  shall  use  the  under¬ 
lying  idea  of  vector- space  duality  in  a  rather  ad-hoc  fashion. 

The  reader  wishing  to  do  so  can  easily  turn  our  remarks  into  a 
strictly  dual  treatment  of  facts  (2.1) -(2.12)  with  the  aid  of 
the  setup  introduce*  in  Section  10. 

(2.19)  DEFINITION.  An  event  (t,  x)  in  a  real,  continuous¬ 
time,  finite-dimensional,  linear  dynamical  system  Z  =  (F(*),  -,  H(*)) 
is  unobservable  iff 

H(s)Cj,(s,  t)x  =  0  for  all  s  €.  [x,  ®). 

(2.20)  DEFINITION .  With  respect  to  the  same  system,  an  event 
(t,  x)  is  unconstructible*  iff 


*In  the  older  literature,  starting  with  KAU'AN  [19o0b, 

Definition  (5*23)],  it  is  this  concept  which  is  called  "observability". 
By  hindsight,  the  present  choice  of  words  seems  to  be  more  natural 
to  the  writer. 
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H(o-)4,j,(o‘,  t)x  «  0  for  all  cr  6  (- «,  t). 

The  motivation  for  the  first  definition  is  obvious:  the 
"occurrence"  of  an  unobservable  event  cannot  be  detected  by  look¬ 
ing  at  the  output  of  the  system  after  time  t.  (The  definition 
subsumes  ujcO,  but  this  is  no  loss  of  generality  because  of 
linearity.)  The  motivation  for  the  second  definition  is  less 
obvious  but  is  in  fact  strongly  suggested  by  statistical  filtering 
theory  (see  Section  10).  In  any  case,  Definition  (2.21)  comple¬ 
ments  Definition  (2.20)  in  exactly  the  same  way  as  Definition  (2.l) 
complements  Definition  (2.2). 

Irom  these  definitions,  it  is  very  easy  to  deduce  the  follow¬ 
ing  criteria: 

(2.21)  PROPOSITION.  In  e  real,  continuous-time,  finite -dimensional, 
linear  dynamical  system  L  =  (F(-),  -,  H( • ) )  an  event  (t,  x) 

(a)  unobservable  if  and  only  if  x  €  kernel  M(t,  t) 
for  all  t  £  R,  t  >  t,  where 

M(t,  t)  =  /*  «£(s,  t)H'(s)H(s)4>f(s,  x)ds; 

(b)  unconst rue tible  if  and  only  if  x  £  kernel  M(s,  t) 

.tor  all  s  £  R,  s  <  t,  where 

M(s,  t)  =  fX  4 £(<t,  T)Hf(ff)H(ff)«p(a,  T)ao. 
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PROOF.  Part  (a)  follows  Immediately  from  the  observation: 
x  €  kernel  M(t,  t)  H(s)®f(s,  t)x  =  0  for  all  s  €  [t,  t].  Part 
(b)  follows  by  an  analogous  argument.  C3 


(2.22)  REMARK,  let  us  compare  this  result  with  Proposition  (2.3), 
and  let  us  indulge  (only  temporarily)  in  abuses  of  language  of  the 
following  sort:* 


and 


(t,  x)  =  unreachable  ^  x  €  kernel  W(t,  t) 
for  all  t  >  t 

(x,  x)  *  observable  x  €  range  M(t,  t) 
for  some  t  >  t. 


Fran  these  relations  we  can  easily  deduce  the  so-called  "duality 
rules''1;  that  is,  problems  involving  observability  (or  constructs  1- 
ity)  are  converted  into  problems  involving  reachability  (or  control¬ 
lability)  in  a  suitably  defined  dual  system.  See  KALMAN,  FALB, 
and  AR3IB  [1969*  Chapter  2,  Proposition  (6.12)]  and  the  broader 
discussion  In  Section  10. 


We  will  say,  by  slight  abuse  of  language,  that  a  system  is 
completely  observable  whenever  0  is  the  only  unobservable  state. 
Thus  the  Basic  Lemma  (2.7)  "dualizes"  to  the 


(2.23)  PROPOSITION.  A  real,  continuous -time  or  discrete-tine, 
n-dimenslonal,  linear,  constant  dynamical  system  £  «  (F,  -  ,  H) 


*A11  this  would  be  strictly  correct  if  we  agreed  to  replace 
"direct  sum"  in  Proposition  (2.1l)  and  its  counterpart  (2.25)  by 
* orthogonal , direct  sum" j  but  this  would,  be  an  arbitrary  convention 
which,  vtile  convenient,  has  no  natural  system- theoretic  justifica- 
t ion.  •  .Reread  Remark 1 ( 2. 13) . 
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Is  completely  observable  If  and  only  if 

(2.24)  rank  (H*,  F’H',  (F,)n-1H,)  =  n. 

Ey  duality,  complete  const rue tibility  in  a  continuous -time 
system  is  equivalent  to  observability;  in  a  discrete-time  system 
this  is  not  true  in  general  but  it  is  true  when  det  F  /  0. 

It  is  easy  to  see  also  that  (2.1l)  "dualizes"  to: 

(2.25)  PROPOSITION.  The  state  space  X^,  of  a  real,  continuous¬ 
time  or  discrete-time  ,  n-dimensional,  linear,  constant  dynamical 
system  Z  =  (F,  -,  H)  may  be  written  as  a  direct  elm 

h  xi®x2 

and  the  equations  of  Z  are  decomposed  correspondingly  as 

dXi/dt  =  F^, 

dXg/dt  =  F21x1  +  F22X2’ 

y(t)  *  HgX2(t). 

PROOF.  Procesd  dually  to  the  proof  of  Proposition  (2.11), 
beginning  with  the  definition  of  as  the  set  of  all  unobservable 
states  of  Z.  □ 

Combining  Propositions  (2.1l)  and  (2.25)  gives  Theorem  C  as  in 
KALMAN  [1902]. 


This  completes  our  survey  of  the  "classical"  results  related 
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to  reachability,  controllability,  observability,  and 
const ructibility. 

The  remaining  lectures  will  be  concerned  exclusively  with 
discrete-time  systems.  The  main  motivation  for  the  succeeding 
developments  will  be  the  algebraic  criteria  (2.8)  and  (2.2t) 
as.  well  as  a  deeper  examination  of  Theorems  C  and  D  of  the 


Introduction. 
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3.  DEFINITION  OF  STATES  VIA  NERQDE  EQUIVALENCE  CLASSES 

A  classical  dynamical  system  is  essentially  the  action  of  the 
time  set  T  (»  reals)  .on  the  states  X.  In  other  words,  the 
states  are  acted  on  by  an  abelian  group,  namely  (R  +  usual 
definition  of  addition).  This  is  a  trivial  fact,  but  it  has  deep 
consequences.  A  (modern)  dynamical  system  is  the  action  of  the 
inputs  0  on  X;  in  exact  analogy  with  the  classical  case,  to 
the  abelian  structure  on  T  there  corresponds  an  (associative 
but  noncommutative)  semigroup  structure  or.  0.  The  idea  that  fi 
always  admits  such  a  structure  was  apparently  overlooked  until 
the  late  1950* s  when  it  became  fashionable  in  automata  theory 
(school  of  SCHUTZENBERGER).  This  seems  to  be  the  "right"  way 
of  translating  the  intuitive  notion  of  dynamics  into  mathematics, 
and  it  will  be  fundamental  in  our  succeeding  investigations. 

It  is  convenient  to  assume  from  now  on,  until  the  end  of 
these  lectures,  that 

(3»l)  T  =  time  set  =  Z  -  additive  (ordered)  group  of 
integers. 

Since  we  shall  be  only  interested  in  constant  systems  from 
here  on,  we  shall  adopt  the  following  normalization  convention;* 


*In  the  discrete-time  nonconstant  case,  we  would  have  to  deal 
with  Z  copies  of  fi,  each  normalized  with  respect  to  a  different 
particular  value  of  t  €  Z. 


—  M>  — 


(3-2)  Ko  clement  of  ft  is  defined  for  t  >  t  -  0. 


R.  E.  Kalman 


In  view  of  (3.2),  ve  can  define  the  "length"  |a>|  of  m  by 


|cd|  *  max  C-t  €  2:  cd  is  r.ot  defined  for  any  s  <  t}. 


Before  defining  the  semigroup  on  ft,  ve  introduce  another 
fundamental  notion  of  dynamics:  the  (left)  shift  operator  crQ> 
defined  for  all  q  >  G  in  Z  by 

(3-3)  0  -*  0:  cd  w  t  -»  a>(t  +  q). 


Bote  that  the  definition  of  cr^  is  compatible  with  the  normaliza¬ 
tion  (3.2). 

If  m  empty  for  cd,  cd*  €  ft,  we  define  the  Join 

of  cd  and  to*  as  the  function 


(3-*0  cd  v  cd1 


f“  0:1  'V 

v.u>f  on  J  .. 

CD* 


When  ft  has  an  additive  structure,  then  we  replace  u>  v  <d*  by  <d  +  a;* 


(3*5)  IEfTNITION.  There  is  an  associative  operation 
• :  ft  X  ft  •■¥  ft,  called  concatenation,  defined  by 

•  :  (a >,  v)  h  v  v. 

Note  that,  by  (3*2)  through  (3.4),  o  is  well  defined. 

Note  also  that  the  asserted  existence  of  concatenation  rests 
on  the  fact  that  ft  is  made  up  of  functions  defined  over  finite 
intervals  in  T.  We  might  express  the  content  of  (3*5)  also  as: 
ft  is  a  semigroup  with  valuation,  since  evidently  |cd0v|  «  |cd|  +  | V j 
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In  view  of  (3*5)*  it  is  natural  to  use  an  abbreviated  notation* 
also  for  the  transition  function,  as  follows: 

(3.6)  =  <p(0;  -  i<o|,  x,  cd) 

Nov  we  come  to  an  important  uonclassical  concept  in  dynamical 
systems,  whose  evolution  was  strongly  influenced  by  problems  in 
coBBunl cations  and  automata  theory:  a  discrete-time  constant 
input/output  map 

(3-7)  f:  Q  Y:  u>  ^  f(u>)  »  y(l) 

We  interpret  this  map  as  follows:  y(l)  is  the  output  of  some 
system  E  (say,  a  digital  computer)  when  E  i6  subjected  to  . 
the  (finite)  input  sequence  co,  assuming  that  E  is  some  fixed 
initial  equilibrium  state  before  the  application  of  o.  This 
definition  automatically  incorporates  the  notions  of  "discrete¬ 
time"  as  well  as  "causal"  or  "dynamics"  (the  latter  because 
y(t)  is  not  defined  for  t  <  l).  However,  (3*7)  does  not 
clearly  imply  "constancy"  (implicitly,  however,  this  is  clear  from 
the  normalization  assumption  (3*2)  on  fl).  To  make  the  definition 
more  forceful,  we  extend  f  to  the  map 

.  (3.8)  f:  71  P  =  YXy  (infinite  cartesian  product) 

:  u>  (f(o>),  f(aflo>),  ...  )  =  (y(l),  y(2),  ...  ). 

Interpretation:  f  gives  the  output  sequence  r  =  (y(l),  y(2),  ... 
of  the  system  E  after  t  =  0  resulting  from  the  application  of  an 

♦Observe  that  x*o>  is  the  strict  analog  of  the  notation  xt 
customary  in  topological  dynamics.  The  action  of  o>  on  x  satis¬ 
fies  xa(a><>v)  =  (xco>)ov  in  view  of  (1.5). 


38 


R.  E.  Kalman 

input  cd  which  stops  at  t  =  0. 

This  definition  expresses  causality  more  forcefully  and 
incorporates  constancy,  provided  we  define  the  (left)  shift 
operator  Op  on  T  so  as  to  he  compatible  with  (3*3)  *  So, 
for  any  t  >  0,  t  £  Z,  let 

(3*9)  o-pt  £  T:  rn  t  v-»  r(t  +  t) 

s(y(l),  y(2),  ...  )  t-»  (y(t  +  l),  y(t  +  2),  ... 

Note:  the  operator  <rfi  "appends'1  an  unde  fined  term  at  0,  the 
operator  <Tp  "discards i!  the  term  y(l). 

Now,  dropping  the  bar  over  f,  *re  adopt 

(3.10)  DEFINITION.  A  discrete-time,  constant  input/output  map 
(of  some  underlying  dynamical  system  E)  i?  any  map  f  such  that 
the  following  diagram 


is  commutative.  We  say  that  f  is  linear  iff  it  is  a  K- vector 
space  homomorphism • 

It  will  be  convenient  to  regard  (3*10)  as  the  external 
definition  of  a  dynamical  system,  in  contrast  to  the  internal 
definition  set  up  in  Section  1. 

Intuitively,  we  should  think  of  f  as  a  highly  idealized 
kind  of  experimental  data;  namely,  f  incorporates  all  possible 
information  that  could  be  gained  by  subjecting  the  underlying 
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system  to  experiments  in  which  only  input/output  data  is  avail¬ 
able.  This  point  of  view  is  related  to  experimental  physics  the 
same  way  as  the  classical  notion  of  a  dynamical  system  is  related 
to  Newtonian  (axiomatic)  physics. 

The  basic  question  which  motivates  much  of  what  will  follow 
can  now  be  formulated  as  follows: 

(3.11)  PROBLEM  OF  REALIZATION.  Given  only  the  knowledge  of 
f  (but  of  course  also  of  Z,  ft,  and  r)  how  can  we  discover, 
in  a  mathematically  consistent,  rigorous,  and  natural  way,  the 
properties  of  the  system  £  which  is  supposed  to  underlie  the 
given  input/output  map  f? 

This  suggests  immediately  the  following  fundamental  concept: 

(3.12)  DEFINITION.  A  fixed  dynamical  system  £  (internal 

definition,  as  in  Section  l)  is  a  realization  of  a  fixed  input/ 

output  map  f  iff  f  =  ,  that  is,'  f  is  identical  with 

0 

the  input/output  map  of  £q. 

In  view  of  the  notations  of  Section  1  plus  the  special  con¬ 
vention  (3.6)>  the  explicit  form  of  the  realization  condition  is 
simply  that 

*  (3.13)  t  M  *  \  (qfc  (0;  -  M,  *,  a))) 

0  0 

for  all  a)  ft.  The  symbol  *  stands  for  an  arbitrary  equili¬ 
brium  state  in  which  £q  remains,  by  definition,  until  the 
application  of  cjd.  (Later  we  simply  take  *  to  be  0.) 
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To  solve  the  realization  problem,  the  critical  step  is  to 
induce  a  definition  of  X  (of  some  £q)  from  the  given  fQ. 

It  is  rather  surprising  that  this  step  turns  out  to  be  trivial, 
on  the  abstract  level.  (On  the  concrete  level,  however,  there  are 
many  unsolved  problems  in  actually  computing  what  X  is.  In 
Section  8,  we  shall  solve  this  problem,  too,  but  only  in  the 
linear  case.)  The  essential  idea  seems  to  have  been  published 
first  by  NERODE  [  195  83  s 

(3.l4)  DEFINITION.  Make  the  concatenation  semigroup  0  into 
a  monoid  by  adjoining  a  neutral  element  0  (which  is  the  nowhere- 
defined  function  on  Z).  Then  cn  o'  (read:  u)  is  Nerode 
equivalent  to  with  respect  to  f)  iff 

f(o)pV)  =  f  (co*  ©v)  for  all  VGA. 

There  are  many  intuitive,  physical,  historical,  and  technical 
reasons  (which  are  scattered  throughout  the  literature  and  concen¬ 
trated  especially  strongly  in  KAIMAN,  FALB,  and  AKBIB  [19893)  for 
using  this  as  the 

(3-15)  MAIN  DEFINITION.  The  set  of  equivalence  classes  under 
denoted  as  Xf  =  ((co)f:  co  G  0),  is  the  state  set  of  the 
input  /output  map  f. 

Let  us  verify  immediately  that  (3*15)  makes  mathematical 


sense : 
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(3-l6)  PROPOSITION.  For  each  linear,  constant  input/output  map 
f  there  exists  a  dynamical  system  L ^  such  that 

(a)  realizes  f; 

(b)  -  Xf. 

PROOF.  We  show  how  to  induce  given  f.  We 

de  "Ine  the  state  set  of  L ^  by  (b) .  Further,  we  define  the 
transition  function  of  by 

(3.17)  Xo v  =  (to)foV  ~  (to0v)f  for  all  v  £  a,  x  £  Xf . 

We  must  check  that  o  on  the  left  of  =  is  well  defined  (note 
two  different  uses  of  o!)>  that  is,  independent  of  the  repre¬ 
sentation  of  x  as  (a>)^..  This  follows  trivially  from  (3  .l1*-)  - . 

Now  we  define  the  readout  map  of  by 

(3.18)  ::  Xf  -»  Y:  (®)fH  f(#)(l) 

Again,  this  map  is  well  defined  since  we  can  take  v  =  0  as  a 
special  case  in  (3.11*-).  Then 

\  (x0v)  =  ((co0v)f)  =  f(o>0v), 

and  the  realization  condition  (3*6)  is  verified.  Hence  claim  (a) 
is  correct.  O 

(3.19)  COMMENTS.  In  automata  theory,  is  known  as  the 

reduced  form  of  'any  system  which  realizes  f .  Clearly,  any  two 
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reduced  forms  are  isomorphic,  in  the  set-theoretic  sense,  since 
the  set  X^.  is  intrinsically  defined  by  f.  (This  observation 
is  a  weak  version  of  Theorem  D  of  the  Introduction;  here  "unique¬ 
ness"  means  "modulo  a  permutation  of  the  labels  of  elements  in 
the  set  X^".)  Mot Ice  also  that  is  completely  reachable 
since,  by  Definition  (3«15/,  every  element  x  =  (ca)^.  of  X^. 
is  reachable  via  any  element  cu<  in  the  Kerode  equivalence  class 
(o>)f.  As  to  observability  of  see  Section  10. 
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4.  WHILES  INDUCED  BY  LINEAR  INHJT/OUTPOT  MAPS 

We  are  now  ready  to  embark  on  the  main  topics  of  these  lectures. 

It  is  assumed  that  the  reader  is  conversant  with  modern  algebra  (espe¬ 
cially:  abelian  groups,  commutative  rings,  fields,  modules,  the  ring 
of  polynomials  in  one  variable* and  the  theory  of  elementary  divisors), 
on  the  level  of,  say,  VAN  DER  WAERBEN,  LANG  [1965],  HU  [19^5]  or 
ZARISKE  and  SAMUEL  [1958>  Vol.  l].  The  material  covered  from  here 
on  dates  from  1965  or  later. 

Standing  assumptions  until  Section  10: 

(4.1)  All  systems  F.  =  (F,  G,  H)  are  discrete-time,  linear, 
constant,  defined  over  a  fixed  field  K  (but  not  necessarily 
finite-dimensional) . 

Our  immediate  objective  is  to  provide  the  setup  and  proof  for  the 

(4.2)  FUNDAMENTAL  THEOREM  OF  LINEAR  SYSTEM  THEORY.  The  natural 

4  - 

state  set  Xf  associated  with  a  discrete-time,  linear,  constant  input- 
output  ck  p  f  over  a  fixed  field  K  admits  the  structure  of  a  finitely 
generated  module  over  the  ring  K[ z ]  of  polynomials  (with  indeterminate 
z  and  coefficients  in  K) . 

(4.3)  COMMENTS.  Since  the  ring  K[z]  will  be  seen  to  be  related 

to' the  inputs  to  £,  this  result  has  a  superficial  resemblance  to  the 
fact  that  in  an  arbitrary  dynamical  system  £  the  state  set  X^.  admits 
the  action  of  a  semigroup,  namely  (see  (3*5)  and  related  footnote). 

It  turns  out,  however,  that  this  action  of  D  on  X,  which  results 
from  combining  the  concatenation  product  in  0  with  the  definition  of 
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states  via  Nerode  equivalence,  is  incompatible  with  the  additive 
structure  of  ft  [KALMAN,  1967,  Section  3l»  Our  theorem  asserts  the 
existence  of  an  entirely  different  kind  of  structure  of  X.  This 
structure,  that  of  a  K[z] -module,  is  net  just  a  consequence  of 
dynamics,  but  depends  critically  on  the  additive  structure  on  ft 
and  on  the  linearity  of  f.  The  relevant  multiplication  is  not 
(noncommutative)  concatenation  but  (commutative)  convolution  (because 
convolution  is  the  natural  product  in  K[zj);  dynamics  is  thereby 
restated  in  such  a  way  that  the  tools  of  commutative  algebra  become 
applicable.  In  a  certain  rather  definite  sense  (see  also  Remark 
(4.30)),  Theorem  (4.2)  expresses  the  algebraic  content  of  the  method 
of  the  Laplace  transforraatioi  especially  as  regards  the  practices 
developed  in  electrical  engineering  in  the  U.S,  during  the  1950's. 

The  proof  of  Theorem  (4.2)  consists  in  a  long  sequence  of  canoni¬ 
cal  constructions  and  the  verification  that  everything  is  well  defined 
and  works  as  needed. 

In  view  of  (4.1)  and  the  conventions  made  in  Section  1,  ft  may 
be  viewed  as  a  K- vector  space  and  cu(t)  »  0  for  almost  all  t  £  Z 
and  all  a)  €  ft.  By  convention  (3.2  ),  we  have  assumed  also  that 
o>(t)  n  0  for  all  t  >  0.  As  a  result,  we  have  that: 

(a)  ft  »  tfz]  as  a  K- vector  space.  Let  us  exhibit  the  Isomor¬ 
phism  explicitly  ae  follows: 

(4.4)  a)  «  ^(tjz**  £  K"[z]. 

By  (3.2  ),  the  sum  in  (4.4)  is  always  finite.  The  isomorphism 
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obviously  preserves  the  K- linear  structure  on.  R.  In  the  sequel,  we 
shall  not  distinguish  sharply  between  o>  as  a  function  T  -♦  K“  and 
oo  as  an  m- vector  polynomial. 

(b)  R  is  a  free  K[  z ] -module  with  a  generators,  that  is, 

R  »  iftz]  also  in  the  K( z ] -module  sense.  In  fact,  we  define  the 
action  of  K[e]  on  R  by  scalar  multiplication  as 


K [z]  XQ-J  R:  (w,  c a)  *-*  ir*a> 

(a^  €  Klz],  3  *=  1,  •••,  »)• 

The  product  of  it  with  the  components  of  the  vector  a>  is  the 
product  in  K[z].  We  write  the  scalar  product  on  the  left,  to  avoid 
any  confusion  with  notation  (3*6  )»  It  is  easy  to  see  that  the  module 
axioms  are  verified;  R  is  obviously  free,  with  generators' 


where 

(4.5) 


TT-O) 


to 


(4.6)  ej 


f —  j-th  position,  j 


1,  •  »  ■  ,  Qr 


(cj  On  R  the  action  of  the  shift  operator  is  represented 

by  multiplication  by  z.  This,  of  course,  is  the  main  reasonfor 
introducing  the  isomorphism  (4.4)  in  the  first  place. 
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(d)  Each  element  of  a  is  a  formal  power  series  In  z~^.  In  fact, 

(4 A)  suggests  viewing  z*  as  an  abstract  representation  of  •  t  €  Z; 
hence  we  define 

(4.7)  r  » 

By  (3.8  )  and  (4.1),  r(t)  €  Kp  for  each  t  >  1  and  is  zero  (or 
not  defined)  for  t  <  1.  In  general  the  sum  is  taken  over  infinitely  many 
nonzero  terms;  there  is  no  question  of  convergence  and  the  right-hand  side 
of  (4.7)  is  to  be  interpreted  stictly  algebraically  as  a  formal  power 
series.  Since  r(0)  in  always  zero  (see  (3*8)),  we  can  say  also  ' 
that 

(e)  T  is  Isomorphic  to  the  K-vector  subspace  of  KP([z-1]] 

(formal  power  series  in  z~^  with  coefficients  in  Kp)  consisting 
of  all  power  series  with  0  first  tern* 

The  first  nontrivial  construction  is  the  following: 

(f)  T  has  the  structure  of  a  K[z]  module,  with  scalar 
multiplication  defined  as 

(4.8)  •:  K[z]  X  r  T:  ( tt ,  r)  r*r  =  7r(<rr)r, 

This  product  may  be  interpreted  as  the  ordinary  product  of  a  power 
series  in  z"*1  by  a  polynomial  in  z,  followed  by  the  deletion  of 
all  terms  containing  no-  negative  powers  of  z.  The  verification  of 
the  module  axioms  is  straightforward. 
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(g)  *  is  a  K[z)  homomorphism.  This  is  an  inmediate  conse¬ 
quence  of  the  fact  that  f  «  constant  (see  (3*10)) and  that  multipli¬ 
cation  by  z  corresponds  to  the  left  shift  operators  on  Q  and  T. 

(h)  The  Nerode  equivalence  classes  of  f  are  isomorphic  with 
ft/kernel  f.  This  is  an  easy  bit  highly  nontrivial  lemma,  connecting 
Nerode  equivalence  with  the  module  structure  on  Q.  The  proof  is  an 
immediate  consequence  of  the  formula 

tvi 

(4.9)  o>«v  »  z  'cc  +  V. 

In  fact,  by  K- linearity  of  f,  (4.9)  implies 
f (id® v)  «  f(a>»®v)  for  all  V£  fi 
if  and  only  if 

f(z* «(u)  =  f(z^«^»)  for  all  k  >  0  in  Z. 

The  proof  of  Theorem  (4.2)  is  now  complete,  since  the  last 
lemma  identifies  X^  as  defined  by  (3*15)  with  the  K[z]  quotient 
module  fl/kernel  f. 

We  write  elements  of  the  latter  as  =  eo  +  kernel  f;  then 

it  is  clear  that  Xf  as  a  K[z) -module  is  generated  by  [e^J^,  ...,  [e^] 
since  fl  itself  is  generated  by  e^,  e^  (see  (4.6)).  Note  also 

that  the  scalar  product  in  fl/kernel  f  is 

(4.10)  (tt,  [<u]f)  TT*[o)]f  a  [ir.a>)f. 

The  last  product  above  (that  in  f.)  has  already  been  defined  in  (4.5). 
The  reader  should  verify  directly  that  (4.10)  gives  a  well-defined 
scalar  product. 
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(4.11)  REMARK.  There  is  &  strict  duality  in  the  setup  used  to 
define  f«  From  the  point  of  view  of  ho=o logical  algebra  (MAC  IANS 
1963],  this  duality  looks  as  follows.  Since  every  free  module  is 
projective,  the  natural  gap 

p:  0  -*  X^:  u>  (co)^ 

exhibits  Xf  as  the  image  of  a  projective  module.  On  the  other 
hand,  there  is  a  bisection  between  the  set  and  the  set 

-  fO)  c  r. 

is  clearly  a  Six] -submodule  of  T  (with  z-f(«>)  «  f(r*o>)), 

and  so  and  are  isomorphic  also  as  K(:J -modules.  It  is 

known  that  T  is  an  injective  module  (MAC  LANE  1963,  page  95, 

Exercise  2)  Go  the  natural  rap  ->  TL^i  [col ^  f(co)  exhibits 
as  a  suteodule  of  an  injective  module.  This  fact  is  basic  in  the 
construction  of  the  "transfer  function"  associated  with  f  (Section  7), 
but  its  full  implications  are  not  yet  understood  at  present. 

There  is  an  easy  counterpart  of  Theorem  (4.2)  which  concerns  a 
dynamical  system  given  in  "internal"  fora: 

(4.12)  PROPOSITION.  The  state  set  Xj.  of  every  discrete- time, 
finite-dimensional,  linear,  constant  dynamical  system  L  -  (F,  G,  -) 
admits  the  structure  of  a  K( z ] -module. 

PROOF.  By  definition  (see  (1.10)),  >1  =  K15  is  already  a 
K-vector  space.  We  make  it  into  a  K[z]-module  by  defining 
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(4.13)  •:  K[  z]  X  K°  -*  K*1:  (tt,  x)  ►-*  7t(f):i.  □ 

(^-14)  COMMENT.  The  construction  used  in  the  proof  of  (4.12)  is 

the  classical  trick  of  studying  the  properties  of  a  fixed  linear  map 
F:  K11  K11  via  the  K[z] -module  structure  that  F  iiduces  on 

K11  by  (4.13).  In  view  of  the  canonical  construction  of  £f  provided  by 
Proposition  ^3«l6),  the  state  set  X  can  be  treated  as  a  K{z]- 
nodule  irrespective  as  to  whether  X  is  constructed  from  f  (X  =  X^) 
or  given  a  priori  as  part  of  the  specification  of  £  (X  =  X^.).  Thus 
the  K[z] -module  structure  on  X  is  a  nice  way  of  uni-.ng  the  "external" 
and  th^  "internal"  definitions  of  a  dynamical  system.  Henceforth  we 
shall  talk  about  a  (discrete- time,  linear,  constant  dynamical)  system 
£  somewhat  imprecisely  via  properties  cf  its  associated  K[z]-module  X^,. 

We  shall  now  give  s nr.s  examples  of  using  module- theoretic  language 
to  express  standard  facts  encountered  before. 

(4.15)  PROPOSITION.  If  X  is  the  state-module  of  £,  the  map 
Fj.  is  given  by  X  ->  X:  xh  z-x. 

PROOF.  This  is  obvious  from  (4.13)  if  X  =  X^..  If  X  =  X^.  -  X^, 
then  we  find  that,  by  (1.17), 

x(  l)  =  Fx(0)  +  Gn(C), 

=  Fmf+Ga>(0); 

since  x(0)  results  from  input  x(l)  results  from  input  z*|  +  co(0) 
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■  [  z  •  £  +  <0(0)]^ 

*  Z*U  Jf  +  Go)(0). 

So  the  assertion  is  again  verified.  □ 


Bow  we  can  replace  Proposition  (2.14)  by  the  much  more  elegant 

(4.16)  PROPOSITION.  A  system  £  »  (p,  G,  -)  is  completely  reachable 
if  and  only  if  the  columns  of  G  generate  X^.. 

PROOF.  The  claim  is  that  complete  reachability  is  equiva¬ 
lent  to  the  Act  that  every  element  x  €  X^.  is  ejqpressible  as 

F; 

x  “  $iv3&y  ^  e  r4z],  g  *  [g^  ...,  gj. 

In  view  of  (4.15),  this  is  the  same  as  requiring  that  x  be  expressible 

as 

this  last  condition  is  equivalent  to  complete  reachability  by  (2.14).  D 

(4.17)  COROLLARY.  Toe  reachable  states  of  £  are  precisely 
those  of  the  submodule  of  X^.  generated  by  (the  columns  of)  G. 

(4.U3)  RiMARK.  The  statement  that  "£  is  not  completely  reachable" 
simply  means  that  X  is  i*ot  generated  by  those  vectors  which  make  up 
the  matrix  G  in  the  specification  of  the  input  side  of  the  system  £. 
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It  does  not  follow  that  X  cannot  be  finitely  generated  by  some  other 
vectors.  In  fact,  tc  avoid  unnecessary  generality,  we  shall  henceforth 
assume  that 

X  is  always  finitely  generated  over  K[  z] • 

From  the  system- theoretic  point  of  view,  the  case  when  we  need 
infinitely  many  generators,  that  is,  infinitely  many  input  channels, 
seems  rather  bizzare  at  present. 

(4.19)  PROPOSITION.  The  system  is  completely  reachable. 

PROOF.  Obvious  from  the  notation:  a  state  x  =  [|]f 
is  reached  by  5  6  □ 

(4.20)  PROPOSITION.  The  system  X^.  is  completely  observable. 

PROOF,  Obvious  from  Lemma  (h)  above:  =  f(*)  =  0 

iff  to  €  [0]^,  which  says  that  the  only  unobservable  state  of  X^. 
is  0  €  Xf.  □ 

Let  us  generalize  the  last  result  to  obtain  a  module -theoretic  criterion 
for  complete  observability.  There  are  two  technically  different  ways  of 
doing  this.  The  first  depends  on  the  observation  that  the  "dual"  of  a 
submodule  (see  Corollary  (4.17))  is  a  quotient  module.  The  second  defines 
observability  via  the  "dual"  system  (F1,  H’,  -)  associated  with  (F,  -,  K) . 

Consider  a  dynamical  system  £  =  (F,.  -,  H)  and  the  corresponding 
K[  z] -module  and  K-hamomorphism  H:  Y  =  K^.  We  can  extend  H 


52 


R.  E.  Kalman 

to  a  K[z] -homomorphism  H  (look  back  at  (?.8))  by  setting 
H:  Xj,  ->  T 

p 

xh  (Hx,  H(z°x),  H(z  »x),  ...  ). 

From  Definition  (2.19)  we  see  that  no  nonzero  element  of  the  quotient 
module  X^/kernel  H  is  unobservable.  Hence^  by  abuse  of  language,  we 
can  say  that  X^, /kernel  H  is  the  module  of  observable  states  of  £. 

Thus  we  arrive  at  phrasing  the  counterparts  of  (4.16-17)  in  the  fc .  low¬ 
ing  language: 

(4.21)  ITOFOSITIOJ?.  A  system  Z  =  (F,  -x  H)  is  completely  observable 
If  aid  only  if  the  quotient  module  X^/kernel  H  is  isomorphic  with  X^,. 

(4.22)  COROLLARY.  The  observable  states  of  Z  are  to  be  identified 
with  the  elements  of  the  quotient  module  X^/kerael  H. 

(4.23)  TEHHIMOLOGY.  The  preceding  considerations  suggest  viewing 

a  system  Z  as  essentially  the  sane  ** thing”  as  a  module  X, 
speaking,  however,  knowing  Z  =  (F,  G,  H)  gives  us  not  only 
(see  (4.13))  hut  also  a  quotient  module  X^  (over  kernel  H) 
module  (that  generated  by  G)  of  that  is 

x£  =  K(z]G/kernel  H. 

If  x£  »  Xj,  we  say  that  is  canonical  (relative  to  the  given  G,  H). 

To  be  more  precise,  let  us  observe  the  following  stronger  version 
of  (4.19-20): 


Strictly 

XZ  =  XF 
of  a  sub- 
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(4.24)  CORRESPONDENCE  THEOREM.  There  is  a  bijective  correspondence 
between  K[  z  ] -homomorphisms  f:  0  -*  V  and  the  equivalence  class  of 
completely  reachable  and  completely  observable  system^  £  modulo  a. 
basis  change  in  X^,. 

Detailed  discussion  of  this  result  is  postponed  until 

Section  7* 

A  stricter  observation  of  the  "duality  principle"  leads  to 

(4.25)  DEFINITION.  The  K-linear  dual  of  Z  -  (F,  G,  H)  is 

T.+  s  (F',  K*,  G')  (*  =  matrix  transposition).  The  states  of 

£*  are  called  costates  of  £. 

The  following  fact  is  an  immediate  consequence  of  this  definition: 

(4.26)  PROPOSITION.  The  state  set  of  Z *  may  be  Riven  the 

structure  of  K[z  module,  as  follows:  (i)  as  a  vector  space 

is  the  dual  of  X^,  regarded  as  a  K-vector  space,  (ii)  the  scalar 
product  in  X^*  is  defined  by 

(z  ^-x*)(x)  =  x* (Fx) . 

(4.26ft)  REMARK.  We  cannot  define  X^*  as  Hom^^X^,  K[z])  equal  to 
K[z]-linear  dual  of  X^,,  because  every“torsion  module  M  over  an  integral 
domain  D  has  a  trivial  D-dual.  However,  the  reader  can  verify  (using 
the  ideas  to  be  developed  in  Section  6)  that  defined  above  is  iso¬ 

morphic  with  Kom^^,  K(z)/K[z]).  See  BOURBARI  [Algebre,  Chapter  7 
(2e  ed.).  Section  4,  No.  8]. 

MKMMU"  - -  --  I . . . .  Wl,  MHW,I|>II  Him  nm  VlilhjiUJd 
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Nov  we  verify  easily  the  following  dual  statements  of  (4.16-17): 

(4.27)  PROPOSITION.  A  system  Z  =  (F,  -,  H)  Is  completely  observable 
if  and  only  if  H*  generates  Xg*. 

(4.28)  COROLLARY.  The  observable  COstates  of  Z*  are  precisely 
the  reachable  states  of  £*,  that  is,  those  of  the  submodule  of 

gengfft-ted  by  II* . 

We  have  eliminated  the  abuse  of  language  incurred  by  talking 
about  "observable  states"  through  introduction  of  the  new  notion  of 
"observable  COstates".  The  full  explication  of  why  this  is  necessary 
(as  veil  as  natural)  is  postponed  until  Section  10. 

The  preceding  simple  facts  depend  only  on  the  noticn  of  a  module 
and  are  iaaaediate  once  we  recognize  the  fact  that  F  may  be  eliminated 
from  statements  such  as  (2.8)  by  passing  to  the  module  induced  by  F 
via  (4.13).  But  module  theory  yields  many  other,  less  obvious  results 
as  well,  which  derive  mainly  from  the  fact  that  K[z)  is  a  principal- 
ideal  domain. 

We  recall:  an  element  m  of  an  R-module  M  (R  -  arbitrary 
consmitative  ring)  has  torsion  iff  there  is  a  r  6  R  such  that 
r-m  «  0.  If  this  is  not  the  case,  m  is  free.  Similarly,  M  is 
said  to  be  a  torsion  module  iff  every  element  of  M  has  torsion. 

M  is  a  free  module  if  no  nonzero  element  has  tnreior.  If  L  C  M 
is  any  subset  of  M,  the  annihilator  1  of  L  is  the  set 

h 

A k  *=  (r;  r»£  =  0  for  ail  i  €  I,}; 
it  follows  immediately  that  A,,  is  an  ideal  In  R.  Note  also  that 
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the  statement  "M  is  a  torsion  module"  does  not  imply  in  general 
that  A^  is  nontrivial,  that  is,  ^  0.  (Counterexample:  t«ike 
an  M  which  is  not  finitely  generated.) 

Coupling  these  actions  with  tut  special  fact  that,  for  us, 

R  **  we  get  a  number  of  interesting  systen*-theoretic  results: 

(4.29)  IxlOPOSmOU.  £  is  finite -dimensional  if  and  only  if  Xj. 
is  a  torsion  JE[z] -module. 

COROLLARY.  If  is  free,  £  is  infinite  dimensional. 

PROOF.  We  recall  that  "£  =  finite  -dimensional"  is  defined 
to  be  "Xj,  *=  finite- dimensional  as  a  K-vector  space".  See  (l.lB). 


surncj 


By  assumption  X  is.  finitely  generated 


by,  say,  q  nonzero  elements  ...,  of  Xj.  (which  are  not 
necessarily  the  columns  of  G).  Hence 

a„  *  a  n ...  Ha 


Since  K[z]  is  a  principal- ideal  domain,  each  of  the  A  is  a  princi- 

xi 

pa]  ideal,  say,  rjK[z]  with  Tj  €  K[z).  If  X^,  is  a  torsion  module, 
then  deg  y ^  =  n^  >  0  for  all  j  *  1,  ...,  q.  For  otherwise 
is  either  zero  (and  then  is  free,  which  is  a  contradiction)  or 
a  unit  which  implies  =  O'  contr  y  to  assumption.  Hence  we  can 
replace  each  expression 


ja  wi'xf 


e  k[z] 
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by  the  simpler  one 

q 

X  =  £  t*j  (*odrJ)]-xJJ 

which  shows  that  X^.,  as  a  K-module,  is  generated  by  the  finite  set 

*1'  2,3V  •*’>  2  1  #xi>  x2>  •••*  xq* 

necessity.  Let  be  the  minimal  polynomial  of  the  map 
Ft  x  ►->  z  *x.  If  Xj,  la  finite-dimensional  as  a  X-module,  deg  >  0. 
This  means  (by  the  usual  definition  of  the  minimal  polynomial  in  matrix 
theory  or  more  generally  in  linear  algebra)  that  annihilates  every 
x  €  Xj.  so  that  Xj.  1  rt  a  torsion  K[z]-module.  □ 

Notice,  from  the  second  half  of  the  proof,  that  the  notion  of  a 
minimal  polynomial  can  be  extended  from  K-linear  algebra  to  K[z] -modules. 
In  fact,  the  same  argument  gives  us  also  the  well-known 

(%. 30)  PROPOSITION.  Every  finitely  generated  torsion  module  M 

over  a  principal- ideal  domain  R  has  a  nontrivial  minimal  p  vnomial 
*M  given  ty  ^  =  ♦}JR. 

(^■•31)  COROLLARY.  If  a  K[z] -module  X  is  finitely  generated  with 
q  generators  and  minimal  polynomial  then 

dim  X  (as  K-vcctor  space)  <  q*deg  ^ 

(if. 32)  REMARK.  The  fact  that  is  completely  reachable  and  is 

therefore  generated  .by  m  vectors  allows  us  to  estimate  the  dimension 

of  2  by  (4.31)  knowing  only  deg  ^  but  without  having  computed 
I  A-, 
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X^,  itself.  (Knowing  Xf  explicitly  means  knowing  F:  xh  z«x,  etc.) 

In  other  words,  the  module -theoretic  setup  considerably  enhances  the 
content  of  Proposition  (3*16).  Guided  by  these  observations,  we  shall 
develop  in  Section  8  explicit  algorithms  for  calculating  dim  directly 
from  f  without  first  ha-zing  to  compute  F. 

(4.33)  PROPOSITION.  If  Xj,  is  a  free  K[z] -module,  no  state  of 
E  can  be  simultaneously  reachable  and  controllable. 

=  free”  means  that  Xj,  is 
(isomorphic  to)  a  finite  sum  of  copies  of  K[zj.  Suppose  for 
simplicity  that  -  K[z].  Then  x  =  reachable  means  that  x  =  5*1 
for  some  i  €  K(z].  Similarly,  x  -  controllable  means  that 
z^*x  +  o>*l  =  0  for  some  o>  €  K[z).  Hence  if  x  has  both  properties, 

(z^|  +'0>)  *1  =  (|o0))*l  .-a  0. 

This  shows  that  1  is  annihilated  by  loco,  the  input  5  followed 
by  u>,  which  contradicts  the  assumption  that  is  free.  □ 

The  most  important  consequence  of  Theorem  (4.2)  is  due  to  the 
fact  that  through  it  we  can  apply  to  linear  dynamical  systems  the  well-known 

(4.34)  FUNDAMENTAL  STRUCTURE  THEOREM  FOR  FINITELY  GENERATED  MODULES 
OVER  A  PRINCIPAL  IDEAL  DOMAIN  R  (Invariant  Factor  Theorem  fc  •  Modules). 
Every  such  module  M  with  m  generators  is  isomorphic  to 


PROOF.  We  recall  that  "Xj, 


(4.35)  R/tpR  «...  ©  R/trR  ©  R8 


—  58 


B.E.  Kalman 

where  the  R/t^R  are  quotient  rings  of  R  viewed  as  modules  over  R, 
the  tj.  (called  the  Invariant  factors  of  M)  are  uniquely  determined 
by  M  up  to  units  in  R,  i  =  2,  , q,  and,  as  usual,  R8 

denotes  the  free  R-module  vith  s  generatorsj  finally,  r  +  s  <  m. ' 

Various  proofs  of  this  theorem  are  referenced  in  KAIMN,  FALB, 
and  ARBIB  [1969,  page  270],  and  one  is  given  later  in  Section  6. 

Rote:  The  divisibility  conditions  imply  that  M  is  a  torsion 
module  iff  s  *  0  and  then 

One  important  consequence  of  this  theorem  (others  in  Section  e7) 
is  that  it  gives  us  the  most  general  situation  when  is  not  a 
torsion  module  £.  For  instance,  combining  (4.33)  with  (4.34),  we 
get 

(4.36)  PROPOSITION.  A  system  cannot  be  simultaneously  completely 
reachable  and  completely  controllable  if  its  K[z] -module  X  has  any 
eo— dimensional  components  (i.e.,  s  >  0  iji  (4.35)). 

(4.37)  REMARK.  Although  our  entire  development  in  this  section  may 
be  regarded  as  a  deep  examination  of  Proposition  (2.14),  most  of  our 
comments  apply  equally  well  to  (2.7),  since  both  statements  rest  on 
the  algebraic  condition  (2.8).  In  fact,  the  only  remaining 
thing  to  be  "algebraized"  is  the  notion  of  "continuous-time".  We 

shall  not  do  this  here.  Once  this  last  step  is  taken,  the  algebra! zat ion 
of  the  Laplace  transform  (as  related  to  ordinary  linear  differential 
equations)  will  be  complete. 
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5*  CYCLICITY  AND  RELATED  QUESTIONS 

We  recall  that  an  R-module  M  (R  =  arbitrary  ring)  is  cyclic 
iff  there  is  an  element  m  £  M  such  that  M  =  Rm.  [it  would  be 
better  to  say  that  such  a  module  is  monogenic:  generated  by  one 
element  m.) 

If  M  is  cyclic,  the  map  R  -»  M:  r  •-»  r*a  is  an  epimorphism 
and  has  kernel  A^,  the  annihilating  ideal  of  m.  This  plus  the 
homomorphism  theorem  gives  the  well-known 

(5.1)  PROPOSITION.  Every  cyclic  R-module  K  with  generator  m 
is  isomorphic  with  the  quotient  ring  R/A^  viewed  as  an  R-module . 

This  result  is  much  more  interesting  when,  as  in  our  case,  R 
is  not  only  commutative  and  a  principal- ideal  domain,  but  specifically 
the  polynomial  ring  K[z]. 

So  let  X  be  a  cyclic  K[z 3 -module  with  generator  g  and  let 
A  =  ♦  K[z],  where  ♦  is  the  minimal  or  annihilating  polynomial  of 

«  O  ft 

g.  By  commutativity  and  cyclicity,  A  =  Av.  Hence  \jr  is  a  minimal 

6  a  g 

polynomial  also  for  X.  Write  ir  =  $  =  y'.  In  view  of  (5.l), 

g  A 

X  w  X[z]/fK[a].  Let  us  recall  some  features  of  the  ring  X[z]/fK[z]: 

(i)  Its  elements  are  the  residue  classes  of  polynomials  ff  (mod  ^), 
7 r  €  K[z] •  Write  these  as  iff]  or  [ff] .  „  Multiulication  is  defined  as 

y 

1 7T ]  -  [ CT ]  =  [ffj]. 

(ii)  Each  [ff]  is  either  a  unit  or  a  divisor  of  zero.  In  fact, 

[ff]  is  a  unit  iff  (ff,  -  greatest  common  divisor  of  ff,  t  is  a 
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unit  in  K[z]  (that  is,  (ir,  t)  €  K).  Then 
G T  +  Xt  *=  1  (ff,  T  €  Klz]) 

so  that  [<r]  is  the  inverse  of  lir].  On  the  other  hand,  if 
(ir,  ♦)  «  0  f  unit  in  K[z],  then  both  lir]  and  lt/0]  are  zero 
divisors  since  [ir]*lfr/0]  =  I(ir/®)tl  -  °* 

(ill)  If  ♦  is  a  prime  in  K[z]  (that' is,  an  irreducible  poly¬ 
nomial  with  respect  to  coefficients  over  the  ground  field  K),  then 
l >y  (ii)  Ktz]/fK[z]  is  a  field.  This  is  a  very  standard  construction 
in  algebraic  number  theory. 

Since  it  is  awkward  to  compute  with  equivalence  classes  lir],  we 
shall  often  prefer  to  work  with  the  standard  representative  of  [ir], 
namely  a  polynomial  ir  of  least  degree  in  lir].  z,  is  uniquely  deter¬ 
mined  by  [it]  anu  the  condition  deg  tt  <  deg  V.  Henceforth  "  will 
always  be  used  in  this  sense. 

The  next  two  assertions  are  immediate: 

(5.2)  PROPOSITION.  K[z]/fK[z]  as  a  K- vector  space  is  isomorphic 
to  the  K-vector  space  ©^  «  {£  €  K[z]:  deg  $  <  n  =  deg  ty]. 

Klz]/tX[s]  Is  also  isomorphic  to  as  a  K[  z] -module,  provided 

we  define  the  scalar  product  in  ®fn)  by  (tt-1)  w  5 i. 

•(5.3)  PROPOSITION.  If  Xj,  is  cyclic  with  minimal  polynomial 
then  dim  t  -  deg  t . 
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Looking  Lack  at  Theorer  (4. 34),  we  see  that  the  nest  general 
K[  zj -module  is  a  direct  sun  of  cyclic  K{z] -nodules.  By  combining 
(5«3)  and  (4.  3.4)  and  using  the  fact  that  dimension  is  additive  under 
direct  summing,  ve  can  replace  (431)  by  the  following  exact  result: 

(5.4)  PROPOSITION.  If  Xj.  is  a  torsion  module  with  invariant 
factors  ...,  then 

dim  L  =  deg  t  +  ...  +  deg  t  . 

A  simple  but  highly  useful  consequence  of  cyclicity  is  the 
so-called  control  canonical  form  [KAU4AN,  FALB,  and  ARBIB,  19&9* 
page  44]  for  a  completely  reachable  pair  (F,  g)  where  g  is  an 
n  x  1  matrix.  We  shall  now  proceed* 'to  deduce  this  result. 

Observe  first  that  "(F,  g)  completely  reachable"  is  equiva¬ 
lent  to  "g  generates  X^,  the  module  induced  by  F  via  (4.13)."  Let 

Xp(z)  =  det  (zl  -  F), 

=  z11  +  OjZn~^  +  ...  +  a n,  €  K; 

then  Xy  is  the  characteristic  (and  also  the)  minimal  polynomial  for 
Xp.  [This  is  a  well-known  fact  of  module  theory.  See  for  example 
KALMAN,  FALB,  and  ARBIB  [  19&9*  Chapter  10,  Section  7]  for  detailed 
discussion.]  As  in  KALMAN  [1962],  consider  the  vectors 
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z*e,  * 


r.X^n)(r)-g, 

(Xp(z)  -  Ctn)-g, 


-  a  *e  .] 
n  n  J 


Note  that  the  last  row  of  F  in  (5*6)  consists  of  the  coefficients 
of  X^.  By  definition,  g  =  e^.  Hence  g  as  a  column  vector  in 
K0  has  the  representation 


(5.7) 


g  = 


0 

LI. 


Conversely,  suppose  (F,  ^  have  the  matrix  representation  (5*6-7) 
with  respect  to  some  basis  in  K*1.  Then  (by  direct  computation) 
the  rank  condition  (2.8)  is  satisfied  and  therefore  (F,  g)  is 
completely  reachable  in  both  the  continuous-time  and  discrete¬ 
time  cases  (Propositions  (2-7)  and  (2.16)). 

We  have  now  proved: 


(5.8)  PROPOSITION.  The  pair  (F,  g)  is  completely  reachable 
if  and  only  if  there  is  a  basis  relative  to  which  F  is  given  by 
(5.6)  gad  g  by  (5 -7) • 

(5.9)  COROLLARY.  Given  an  arbitrary  n-th  degree  polynomial 
X(z)  **  z11  +  PlZn  1  +  . ..  +  in  K[z],  K  «  arbitrary  field.  There 
exists  an  n-vector  l  such  that  \  =  Xp  ^  if  and  only  if  the 
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HSOOF.  Suppose  that  (F,  g)  is  completely  reachable. 
With  respect  to  the  same  basis  (5*5)  which  exhibits  the  canonical 
forms  (5*6-7)*  define 


(5.10)  l 


LPi  ” 


Then  verify  by  direct  computation  that  X  =  X^,_  . 

Conversely,,  suppose  that  (F,  g)  is  not  completely 

reachable.  Then,  recalling  Proposition  (2.12)  (which  is  an 

algebraic  consequence  of  (2.8)  and  hence  equally  valid  for  both 

continuous-time  and  discrete -time),  dim  >  0  and  so  is  also 

deg  •  Since  X1  is  an  F- invariant  subspace  of  X  *=  K11, 

22 

the  polynomial  X_  is  independent  of  the  choic/s  of  basis  in 
U 

a  and  the  same  is  true  then  also  for  •  (^ 

22  11 

particular,  X^  does  not  depend  on  the  arbitrary  choice  of 
22 

Xg  in  satisfying  the  condition  X  *=  X^  ®  Xg.)  In  view  of  (2.12), 
we  have  for  all  n-vcctors  t, 


deg  \  >  0. 

*22 


This  contradicts  th*  claim  that  X  =  is  true  for  any  X 

with  suitable  choice  of  i.  Q 


In  view  of  the  importance  of  this  last  result,  we  shall 


rephrase  it  in  purely  module  theoretic  terms: 
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(5-ll)  THEOREM.  Let  K  be  an  arbitrary  field  and  X  a  cyclic 
Kl  2  ] -nodule  with  generator  g  and  minimal  polynomial  X  of  degree 
n.  There  is  a  bisection  between  n-th  decree  polynomials 
X(z)  *  zn  +  p  z11"1  +  ...  +  Pn  in  K[z]  and  K-homomorphi sms 
/:  K11  -*»  K“s  X^.gn  ly g  (j  o  1,  n  and  X^  defined 
as  in  (5.5))  such  that  X  is  the  minimal  polynomial  for  the 
new  nodule  structure  induced  on  X  by  the  map  z#:  x  H  z*x  -  /(x). 


Note  that  in  (5.11)  /(x)  corresponds  to  g/*x  in  (5*10). 

The  nap  Z  in  (5-11)  defines  a  control  law  for  the  system 
£  =  (F,  g,  -)  corresponding  to  the  module  X.  The  passage  from 
z  to  z#  is  the  module -theoretic  form  of  the  well-known  open- loop 
to  closed- loop  transformation  used  in  classical  linear  control  theory. 


PROOF.  Since  the  vectors  X^«g,  . ..,  X^*g  form  a 
basis  for  K11,  /'is  clearly  a  well-defined  K-homomorphi  sm.  We 
treat  /  formally  as  an  element  of  K[z]  (that  is,  an  operator 
on  X  is  a  K- vector  space),  by  writing  Z*x  =  /(!•$),  where 
I  represents  the  equivalence  class  U)  =  {£:  £*g  =  x) .  Unless 
identically  zero,  /  is  never  a  K[z] -homomorphism  and  therefore 
/  does  not  commute  with  nonunits  in  K(z). 

Define  /j  =  0^  -  Gy  j  »  1,  ...,  n.  We  prove  first 
that  this  choice  of  /  implies  X^(z  •  /).«  X^(z)  for 
j  /•»  1,  ...,  n  +  1*  Use  induction  on  By  definition, 
xf^(z  -  /)  =  X^(z).  fin  the  general  case, 
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■  i)*g  r.  [(z  -  -  l)  +  Pj'g 

=  [(z  -  i)X(3)(z)  +  P3]-g 

=  [zX(3)(z)  +  P3  -  ijl-g 

=  [zX'^(z)  +  Ojl’g 

=  x(^(z).g 


(def.  of  X^+^), 
(inductive  hypothesis), 
(def.  of  i), 

(def.  of 

(def.  of  x(J+1)). 


It  follows  (case  j  =  n  +  l)  that  X  annihilates  X 
regarded  as  a  K[z#] -module.  On.  the  other  hand,  the 

(**)•&  •••>  X^(z#)*g  is  a  "basis  for  X  as  a  K- vector 
space  since  X^(z)*g  X^(z).g  was  such  a  "basis.  So  X 

is  cyclic  with  generator  g  also  a§  a  K[  z*  ]  -module .  Hence 
by  Propositions  (5.1-2)  the  annihilating  ideal  of  g'  with  respect 
to  the  K[z#] -module  structure  cannot  "be  generated  "by  a  polynomial 
of  degree  less  than  n,  that  is,  X  is  indeed  the  minimal  poly¬ 
nomial  with  respect  to  z#.  The  correspondence  X  f+  i  is  obviously 
bijective .  □ 


Th^  proof  immediately  implies  the  following 

(5.12)  COROLLARY.  Let  x  =  |*g  be  any  element  of  X  viewed 
as  a  K[z] -module.  Then  x  has  the  representation  f#«g  with 
respect  to  the  Kf  z»  ]  -module  structure  on  X,  where  6  and  £# 
are  related  as 

5(z)  -  lfS*h*)'g 
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So  the  open -Loop/closed- loop  transformation  is  essentially  a 
change  in  the  canonical  basis,  provided  X  is  cyclic. 

It  is  interesting  that  the  ha/e  long  been  known  in 

Algebra  (they  are  relaxed  to  the  Tschirnhausen  transformation 
discussed  extensively  by  WEBER  [1898,  §46,  54,  7^*  85,  96]),  but  their 
present  (very  natural)  use  in  module  theory  seems  to  be  new. 

**TheoreaT  (5«ll)  hay  be  viewed  as  the  central  special  case 
of  Theorem  A  of  the  Introduction.  Let  us  restate  the  latter  in 
precise  form  as  follows: 

(5.I3)  THEOREM.  Given  an  arbitrary  n-th  degree  polynomial 
*(z)  =  Zn  +  pjz”'1  +  ...  +  Pn  in  Klz],  K  =  arbitrary  field. 

There  exists  an  n  X  m  matrix  L  over  K  such  that  Xp_GIji  "  A 
if  and  only  if  (F,  G)  is  completely  reachable. 

For  some  time,  this  result  had  the  status  of  a  well-known  folk  the  'rem, 
considered  to  be  a  straight forward  consequence  of  (5 • 9) -  The  latter 
has  been  discovered  independently  by  many  people.  (I  first  heard 
of  it  in  1958,  proposed  as  a  conjecture  by  J.  E.  Bertram  and  proved 
soon  afterwards  by  the  so-called  root- locus  method.)  Indeed,  the 
passage  from  (5«ll)  to  (5-13)  is  primarily  a  technical  problem.  A 
proof  of  (5.13)  was  given  by  IAIIGEKHOP  [19&4]  and  subsequently 
simplified  by  WONHAI  [19o7)«  The  first  proof  was  (unnecessarily) 
very  long,  but  the  second  proof  is  also  unsatisfactory;  since 
it  depends  on  arguments  using  a  splitting  field  of  K 


**The  material  between  these  marks  was  added  after  the  Summer 
School. 
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and  fail  when  X  is  a  finite  field.  We  shall  use  this  situation 
as  an  excuse  to  illustrate  the  power  of  the  module- theoretic 
approach  and  to  give  a  proof  of  (5*13)  valid  for  arbitrary  fields. 

The  procedure  of  IANGEHHOP  and  WONHAM  rests  on  the  following 
fact,  of  which  we  give  a  module- theoretic  proof: 


(5.14)  LEMMA.  Let  K  be  an  arbitrary  but  infinite  field.  Let 
F  be  cyclic*  and  (F,  G)  completely  reachable.  Then  there  is 
an  m- vector  a  £  k”  such  that  (F,  Ga)  is  also  completely 
reachable. 

We  begin  with  a  simple  remark,  which  is  also  useful  in 
reducing  the  proof  of  (5*13)  to  Lemma  (5«l8). 

(^.15)  SUBLEMMA.  Every  submoduli  of  a  cyclic  module  over  a 
principal- ideal  domain  is  cyclic. 


WOOF  OF  (5*14).  We  use  induction  on  m.  The  case 
m  «  1  is  trivial.  The  general  case  amounts  to  the  following. 
Consider  the  submodule  Y  of  X  =  Xy  generated  by  the  columns 
fy  of  G.  In  view  of  (5*15)#  Y  is  cyclic.  By  the 

inductive  hypothesis,  we  are  given  the  existence  of  a  cyclic 
generator  of  Y  of  the  form  gy  =  0^  +  ...  +  0^*6^ a±  €  K. 

must  prove:  for  suitable  a,  p  C  K  the  vector  a-g^  f 
is  a  cyclic  generator  for  X. 


*0f  course,  this  means  that  the  K[z]-module  Xy  (see  (4.13)) 
is  cyclic. 
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By  hypothesis,  X  has  an  (abstract)  cyclic  generator 
g^.  By  cyclicity  we  have  the  representations 


gy  l-gy 


*  H’gyi  T,  ti  €  K[z] 


Henue  our  problem  is  reduced  to  proving  the  following:  for  suitable 
a,  p  £  K  the  polynomial  ctr\  +  Pu  is  a  unit  in  K[z]/XpK[z].  This, 
in  turn,  is  equivalent  to  proving 

(5»l6)  or*  +  ^  0  (mod  Gj)  i  *  1,  ...,  r 

where  0^,  . ..,  ©r  in  K[z]  are  the  unique  prime  factors  of 
Xp.  Let  ~  mean  the  representative  of  least  degree  of  equivalence 
classes  mod  0^.  Then  no  pair  (t^,  i  =  1,  r  can  be 

zero.  For  if  one  is,  then  0^1  (X^,  tj,  h),  that  is,  Xp/©^  annihilates 
the  submodule  X*  «  K[z]gy  +  K[z)^,  whence  X1  is  a  proper  sub- 
module  of  X,  contradicting  the  fact  that  (F,  G)  is  completely 
reachable.  If  all  the  are  zero,  then  every  ^  0,  so  t) 
is  a  unit  in  K[z]/XpK[z],  and  gy  is  already  a  cyclic  generator. 

So  let  a  =  1.  Then  the  condition  =  C  eliminates  at  most 

r  values  of  p  from  consideration.  Since  K  is  infinite  by 
hypothesis,  there  are  always  some  p  which  satisfy  (5.16).  □ 

An  essential  part  of  the  lemma  is  the  stipulation  that  a  £  K™. 

The  hypothesis  "F  «  cyclic  +  (F,  G)  *  completely  reachable"  means  that 


^  =  oyti  +  . . .  +  q:bV  “i  e  K[z]; 


-‘****f**‘*C5K- 
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that  is,  the  lecaa  is  trivially  true  for  tone  a  €  xf*{zj  since 
*  G&.  But  since  we  want  a  £  K,  -there  Bust  he  interaction 
between  vector- space  structure  and  nodule  structure,  and  for  this 
reason  the  lenaa  is  nontrivial-  As  a' natter  of  fict,  the  lewwi  is  ffclse 
when  X  =  finite  field.  The  simplest  counterr.xanple  is  provided 
when  (5*12)  rules  out  a  single  nonzero  value  of  p,  thereby  ruling 
out  all 

(5.1?)  COGRTEREXAKFUS.  let  K  *  Z/2Z,  that  is,  the  ring  of 
integers  modulo  the  prise  ideal  2Z.  Consider 


0  10  0  0 

0  0 

110  0  0 

1  0 

0  0  0  1  0 

,  0  « 

0  0 

0  0  0  0  0 

0  1 

0  0  0  0  1 

1  1 

Ifotlce  Uat  Xy  *  X^  ©  Xg  ©  (**  *  Klz]-module),  where  the 

minimi  polynomials  of  the  direct  sunsands  are 

X^U) .  *=  z2  +  z  +  1, 

\(t)  »  z2, 

X^(s)  «  z  +  1. 

All  these  factors  are  relatively  prime,  (X^,  X^,  X^)  e  i,  hence 
X  Is  cyclic,  notice  also  that  g^  generates  X1  ©  while 
generates  ©  X^.  A  cyclic  generator  for  X  in 
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A  simple  calculation  gives 

ej  -  &>  -  (**  +  *2*  D-gj. 

Conditions  (5-l6)  ore  here 

a-1  +  P-0  ±  0  (mod  Xjh 
a-0  +0-1^0  (nod  Xg), 
a-1 +0-1  £  0  (mod  X^). 

The  -*  conditions  have  no  solution  in  |/2Z. 

At  this  point,  the  following  is  the  situation  concerning 
Theorem  (5 .13) ; 

(1)  Its  counterpart.  Theorem  A  of  the  Introduction,  was 
claimed  to  he  true  in  the  continuous-time  case  under  the  hype  .ie sis 
of  complete  controllability. 

(2)  In  the  discrete-time  case  (5-13)  with  the  preceding 
hypothesis  Theorem  A  is  false,  because  of  the  counterexample:  the  pair 
(F  =  nilpotent,  G  =  0)  is  completely  controllable,  but  evidently 
*F-GLJ  indePendent  of  However,  in  view  of  (5.11), -Theorem 
(5-13)  might  he  true  also  in  the  discrete-time  case  if  "cu.uplete 
controllability"  is  replaced  by  ’'complete  reachability”,  this  modi¬ 
fication  being  immaterial  in  the  continuous-time  case. 

(3)  Because  of  (5.17)#  we  might  expect  that  a  theorem  like  (5-13) 
is  false  for  an  arbitrary  field  K. 
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(4)  If  our  general  claim  that  reachability  properties  are 
reflected  in  module -theoretic  properties  is  true,  then  (5.13) 
should  hold  without  assumptions  concerning  K,  because  the  principal 
module- theoretic  fact,  that  K[z]  =  principal  ideal  domain,  is 
independent  of  the  specific  choice  of  K. 

We  now  proceed  to  establish  Theorem  (5.13) •  That  is,  special 
hypotheses  on  K  will  turn  out  to  be  irrelevant. 

PROOF  OF  (5.13).  Necessity  is  proved  exactly  as  in  (5.8). 
Sufficiency  will  follow  by  induction  on  m,  once  we  have  proved  it 
in  the. special  case  m  ®  2: 

(5.18)  LEMMA..  Let  K  be  an  arbitrary  field  and  let  X  be  a 
K[ 2] -module  generated  by  g^,  g^.  There  is  a  K-honomorphi sm  l 
(of  the  type  defined  in  (5*11)  such  that  if  z*  =  z  -  i  induces  a 
Ktz^] -module  structure  on  X  then  X  is  cyclic  with  respect  to  this 
structure  and  is  generated  by  either  g^  +  g2  or  g^. 

PROOF.  Let  Y  *=  X[z]g1  and  Z  =  K[z]g2. 

Case  1.  yHz  =  0,  that  is,  X  =  Y  ®  Z.  In  (5.11) 
take  an  i  such  that  /(x)  =0  for  all  x  £  Z.  Replacing  z  by 
z*  =  z  -  t  will  change  the  K[z] -module  structure  on  Y  out  pre¬ 
serve  that  on  Z.  Further,  choose  l  so  that  the  new  minimal  poly¬ 
nomial  X  on  Y  is  prime  to  the  unchanged  minimal  polynomial  X^,  =  X 

Z 

on  Z.  Thus  there  exist  polynomials  v,  a  such  that  VX  +  ctX  =  1. 

By  hypothesis,  every  x  €  X  has  the  representation 

X  =  y+z  =  vgi  +  S'V 
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Now  verify  that 

x  *  (ijaX  +  ^vX)*(g1  +  gg), 

-  n^x*g]  +  5vX*g^ 

=  n(i  -  vxj.g^  +  5(1  -  ox) •  gg, 

•=  n'g1+?*g2* 

Hence  g1  +  is  indeed  a  cyclic • generator  for  X  as  a  ' 

X[z*] -module. 

Casa  2.  iHz  =  W  ^  0.  Let  w  C  W.  By  hypothesis, 
there  is  a  |  €  K[z]  such  that  w  s  S*g2  and  therefore,  by 
cyclicity  of  Y,  there  is  also  a  tj  €  K[z]  such  that  |*g2  -  w  = 

Take  same  w  ^  0.  Then  if  tj  -  unit  (mod  X^)  we  are  done  because 
fj""^  *  •  generates  Y,  and  so  Z  =  X.  In  the  nontrivial  case, 
rj  f  unit  (mod  X^) .  To  show:  there  is  a  suitable  new  module  structure 
on  X  such  that  ti*  =  unit  (mod  X#),  being  the  minimi  poly¬ 
nomial  of  X  as  a  ‘K[z*] -module. 

The  main  facts  we  need  are  the  following: 

(5*19)  SUBLEMMA. .  Let  X  be  a  fixed  element  of  K[z]  with 

deg  X  =  n,  F^  the  companion  matrix  of  X  given  by  (5*6),  X^, 

X 

the  cyclic  module  induced  by  r>5  and  g  a  cyclic  generator  of 

X*,  .  Then  n  €,  K[z]  is  a  unit  modulo  X  if  and  only  if  n*6  is 

FX  ^  =  ~ “ 

also  a  cyclic  generator  of  Xf  * 


PROOF.  Obvious. 
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(5.20)  SUBLEMMA.  Same  notations  as  in  (5.19) »  Write 


n 


,(d) 


t|  *  t^Xw;(z)  (X^>  defined  in  (5-5)) 


Then  7\  Is  a  unit  modulo  X  if  and  only  if 


(5.21)  det  (y,  F^,  ...,  F^y)  £  0, 


where  y  is  the  column  vector 


(5-22)  y  = 


'n 


PROOF.  Since  X^,  . X^  is  the  basis  for  the 

K- vector  space  of  all  polynomials  of  degree  <  n,  the  n- tuple 

(1J^,  •  •  •;  *in)  is  uniquely  determined  by  ij.  By  definition  F^ 

is  the  matrix  representing  the  module  operator  z:  x  *->  z*x  relative 

to  the  special  basis  e^,  . ..,  en  in  given  by  (5*5)-  Similarly, 

X 

using  one  of  the  module  axioms,  we  verify  that 


n-g  =  (z)1’ 

n 

$1  Vj+i‘V 


In  other  words,  the  numerical  vector  (5-22)  represents  the  abstract 

vector  rj*g  in  relative  to  the  same  basis  e_,  ...,  e  .  Recall 

fX  in 
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that  Tpg  generates  Xp  iff  (F,,,  Tl(Fx)g}  is  complete  reachable. 
By  (2.7)  the  latter  condition  is  equivalent  to  ( 5 • 21 ) .  The  rest 
follows  from  (5*19) •  0 

(5.23)  SUBLEMMA..  Same  notations  as  in  (5-19)  and  (5.20).  Given 
any  nonzero  numerical  n- vector  (5.22),  there  exists  a  polynomial  X 
such  that  (5.2l)  is  satisfied. 

PROOF.  Let  ?j  be  the  first  member  of  the  sequence  of 
numbers  t(^,  ...  which  is  nonzero.  Write 

X(z)  ~  zn  +  o^zn  ^  +  •  • .  +  an, 

and  determine  the  first  r  coefficients  of  X  by  the  rule 


- 

—  -s 

-  — 

r  ...  n 

*r  r+1  *n 

a 

r 

0 

0  n  ...  t( 

*r  ‘n-1 

Vi 

0 

•  •  a 

• 

• 

•  •  • 

• 

• 

•  •  • 

• 

• 

0  0  ...  t! 

a 

1 

L  ‘r  J 

L  n  J 

(Since  ail  numbers  belong  to  a  field,  the  required  values  of 
Qy  ...,  an  exist.)  Now  check,  by  cc imputation,  that  these  conditions 
reduce  the  matrix  in  (5«2l)  to  the  direct  sum  of  two  triangular 
matrices,  each  with  nonzero  elements  on  its  diagonal.  □ 

In  view  of  (5.12),  it  follows  from  these  facts  that  we  can 
always  choose  a  new  -  Xj.  such  that  ==  unit  mod  X^. 
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The  proof  of  Case  2  is  not  yet  complete,  however,  because 
we  must  still  extend  the  K[z.,J -module  structure  from  Y  to  X.  This 
is  easy.  Write  first  Z  »  W  ©  Z’  and  then  X  =  Y  ©  Z',  where  the 
direct  sum  is  now  with  respect  to  the  K- module  structure  of  X.  Extend 
/  from  Y  to  X  by  setting  £|Z'  ^0.  I'ow  we  have  a  new  minimal 
polynomial  X*  defined  over  X  Since  z#  *  ^  on  Y,  .  By 

(5.12),  I  is  replaced  by  some  I*  such  that 

(5.24)  w  *  1*-^  «=  n*‘g2>  ■ 

that  is,  our  previous  representation  of  w  /  0  in  W  induces  a 
similar  representation  with  respect  to  the  new  K[z^.] -module  structure 
on  X.  Since  is  a  unit  modulo  Xj.,  we  can  write 

<n^  *  1  +  tXj.,  with  a,  t  6  F[z#]. 

By  (5.24),  we  have,  with  respect  to  the  K[z#]- structure, 

(ff§*)*g2  *  <r-(s*-g2), 

=  (l+TXt)-g;L, 

e  6r 

This*  proves  that  g?  generates  both  Y  and  Z ;  that  is,  gg  is 
a  cyclic  generator  for  X  endowed  with  the  K[z^.]- structure.  The 
proof  of  Lemma  (5.18).  is  now  complete.  .□ 
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It  should  he  clear  that  Theorem  (5*13)  is  not  a  purely  module- 
theoretic  result,  hut  depends  on  the  interplay  between  module  theory, 
vector -spaces,  and  elimination  theory  (via  ( 5 • 21 ) ) .  For  instance, 
the  fact  that  l  can  he  extended  from  Y  to  X,  which  was  needed 
in  the  proof  of  Case  2,  is  a  typical  vector- space  argument.** 

There  are  many  open  (or  forgotten)  results  concerning  cyclic 
modules  which  are  of  interest  in  system  theory.  For  instance,  it 
is  easy  to  show  that  an  n  X  n  real  matrix  is  cyclic  iff  a  certain 
polynomial  f  6  R[z^,  ...,  is  nonzero  at  F;  the  polynomial 

¥  is  roughly  analogous  to  the  polynomial  det  in  the  same  ring, 
hut,  unlike  in  the  latter  case,  the  general  form  of  ¥  does  not  seem 
to  he  known. 

We  must  not  terminate  this  discussion  without  pointing  out 
another  consequence  of  cyclicity  which  transcends  the  module  frame¬ 
work.  Since  X  =  cyclic  with  generator  g  is  isomorphic  with 
K[z]/X  K[z],  it  is  clear  that  X  also  has  the  structure  of  this 

o 

commutative  ring,  that  is,  the  product  is  defined  as 

xxy  =  lgxi)-g  =  Un)*g  »  (CrT) -g. 

If  X  «  irreducible,  then  X  is  even  a  field.  Hence,  in  particular, 

w 

X  has  a  galois  group.  No  one  has  ever  given  a  dynamical  interpreta¬ 
tion  of  this  galois  group.  In  other  words,  there  are  obvious  algebraic 
facts  in  the  theory  of  dynamical  systems  which  have  never  been  examined 
from  the  dynamical  point  of  /i  sw.  For  some  related  comments  in  the 
setting  of  topological  semigroups,  see  DAY  and  WALLACE  [ 1967 ] . 
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(6.0)  PREAMBLE.  There  has  been  a  vigorous  tradition  in  engineer¬ 
ing  (especially  in  electrical  engineering  in  the  United  States  during 
1940-1960)  that  seeks  to  phrase  all  results  of  the  theory  of  linear 
constant  dynamical  systems  in  the  language  of  the  Laplace  transform. 
Textbooks  in  this  area  often  try  to  motivate  their  biased  point  of 
view  by  claiming  that  "the  Laplace  transform  reduces  the  analytical 
problem  of  solving  a  differential  equation  to  an  algebraic  problem”. 
When  directed  to  a  mathematician,  such  claims  are  highly  misleading 
because  the  mathematical  ideas  cf  the  Laplace  transform  are  never  in 
fact  used.  The  ideas  which  are  actually  used  belong  to  classical 
complex  function  theory;  properties  of  rational  functions,  the 
partial-fraction  expansion,  residue  calculus,  etc.  More  importantly, 
the  word  "algebraic"  .is  used  in  engineering  in  an  archaic  sense  and 
the  actual  (modern)  algebraic  content  of  engineering  education  and 
practice  as  related  to  linear  systems  is  very  meager.  For  example, 
the  crucial  concept  of  the  transfer  function  is  usually  introduced 
via  heuristic  arguments  based  on  linearity  or  "defined"  purely  formally 
as  "the  ratio  of  Laplace  transforms  of  the  output  over  the  input”.  To 
do  the  Job  right,  and  to  recognize  the  transfer  function  as  a  natural 
and  purely  algebraic  gadget,  requires  a  drastically  new  point  of  view, 
which  is  now  at  hand  as  the  machinery  set  up  in  Sections  3-5-  The 
essential  idea  of  our  present  treatment  was  first  published  in 
KAltfAN  [1965b]. 
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The  first  purpose  of  this  section  is  to  give  an  intrinsically 
algebraic  definition  of  the  transfer  function  associated  with  a 
discrete -time,  constant,  linear  input/output  map  (see  Definition  (3.10)). 
Since  the  applications  of  transfer  functions  are  standard,  we  shall  not 
develop  them  in  detail,  but  we  do  want  to  emphasize  their  role  in  relat¬ 
ing  the  classical  invariant  factor  theorem  for  polynomial  matrices  to 
the  corresponding  module  theorem  (4.34). 

Consider  an  arbitrary  K[ z ) -homomorphism  f:  £2  -»  T  (see  lemma 
(g)  following  Theorem  (4.2)).  Then  as  a  "mathematical  object"  f  is 
equivalent  to  the  set  (f(e^),  i  =  1,  ...,  m,  e^  defined  by  (4.6)), 
since 

(6.1)  f(“)  =  ^  uyfCej). 

(The  scalar  product  on  the  right  is  that  in  the  K[z)-module  T,  as 
defined  in  Section  4.)  By  definition  of  T,  each  f(e^)  is  a  formal 
power  series  in  z-1  with  vanishing  first  term.  We  shall  try  to 
represent  these  formal  power  series  by  ratios  of  polynomials  (which 
we  shall  call  transfer  functions*)  and  then  ve  can  replace  formula  (6.1) 
by  a  certain  specially  defined  product  of  a  ratio  of  polynomials  by  a 
polynomial.  Some  algebraic  sophistication  will  be  needed  to  find  the 
correct  rules  of  calculations.  These  "rules"  will  consititute  a 
rigorous  (and  simple)  version  of  Heaviside1 s  so-called  'calculus". 

There  are  no  conceptual  complications  of  any  sort.  (However,  we  are 
dodging  some  difficulties  by  working  solely  in  discrete-time.) 

♦This  entrenched  terminology  is  rather  unenlightening  in  the  present 
algebraic  context. 


KISSES! 
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Let  X£  =  fl/kernel  f  "be  the  state  set  of  f  regarded  as 
a  K[z] -module.  We  assume  that  Xf  is  a  torsion  module  with  nontrivial 
minimal  polynomial  y .  Then,  for  each  J  =  1,  m  we  have 

(6.2)  t*f(e^)  =  f(^-e^)  =  qd^e^J)  =  n(t«[e^3)  =  0. 

By  definition  of  the  module  structure  on  r,  (6.2)  means  that  the 
ordinary  product  of  the  power  series  f(e.j)  by  the  polynomial  t  is 
a  (vector)  polynomial.  Hence  (6.2)  is  equivalent  to  (notation: 
no  dot  «  ordinary  product) 


(6.9.')  *f(ej)  =  9^  €  Kp[z],  i  =  1,  ....  m. 

Intuitively,  we  can  solve  this  equation  by  writing  f(e  )  =  Q  /i|r. 

J  J 

There  are  two  ways  of  making  this  idea  rigorous. 

Method  1.  Define 


(6.3)  f(ed)  * 

as  the  formal  division  of  0^  by  ^  into  ascending  powers  of  z~\ 

Check  that  the  coefficient  of  z°  is  always  0.  Verify  by  computation 
that  the  power  series  so  obtained  satisfies  (6.2')= 

Method  2.  Multiply  both  sides  of  (6.2* )  by  z’m.  Write 
tjz"1)  *  z“*V(z)  and  3^(z_1)  =  z_n9(z).  Then  ?  €  K[z_13  CKflz"1]] 
and  (6.2*)  becomes 

(6.2")  foiej  =  8^  €  K^z*1]. 

Moreover,  the  0-th  coefficient  of  J  is  1  (because  of  the  convention 
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that  the  leading  coefficient  of  ^  is  l),  hence  $  is  a  unit  in 
Kttz’1]]  and  therefore 


(6.3*)  f(e^)  =  9(z'1)^"1(z‘1). 

Note  that  (6.3)  and  (6.3')  actually  give  slightly  different  defini¬ 
tions  of  f(e .),  depending  on  whether  we  use  a  transfer  function  with 
respect  to  the  variable  z  or  z~\  (Both  notations  have  been  used 
in  the  engineering  literature.)  For  us  the  formalism  of  Method  1  ia 
preferable.  (The  calculations  of  Method  1  can  be  reduced  by  Method  2 
to  the  better-known  calculations  of  the  inverse  in  the  ring  K[[z_1]3.) 

Summarizing^  we  have  the  easy  tut  fundamental  result: 

(6.4)  EXISTENCE  OF  TRANSFER  FUNCTIONS.  There  is  a  bi.jective 
correspondence  between  K[  z  3  -homomorphisns  f:  ft  ->  V  with  minimal 
polynomial  y  and  transfer  function  matrices  of  the  type 

z  =  [©/?,  ...,  om/H 

where  Oj  G  Kr[z3,  deg  0^  <  deg  and  \|r  is  the  least  common 
denominator  of  Z. 

In  many  contexts,  it  is  preferable  to  deal  with  the  Zf  corres¬ 
ponding  to  f  rather  than  with  f  itself.  Because  the  correspondence 
is  bijective,  it  is  clear  that  all  objects  induced  by  f  are  well- 
defined  also  for  Zf  and  conversely.  Thus,  for  instance, 

dim  Zf  ^  dim  f  =  dim  Xfj 

-  least  common  denominator  of  Z, 

*  minimal  polynomial  of  f„. 

Zj 
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(6.5)  REMARK,  In  view  of  Propositions  (4.20-21),  the  natural 

realization  of  Z,  namely  ,  is  or^pletely  reachable  as 

veil  as  completely  observable.  Not  having  this  fact  available  before  i960 
has  caused  a  great  confusion.  Questions  such  as  tbosi  resolved  by  Theorem  (5.13) 
tended  to  be  attached  algorithmically,  using  special  tricks  amounting 
to  elementary  algebraic  canipulauions  of  elements  of  Z.  Very  few 
theoretical  re ~ilts  could  be  conclusively  established  by  this  route 
until  the  conceptual  foundations  of  the  theory  of  reachability  and 
observability  were  developed. 

The  preceding  results  say  be  restated  as  "rules"  whereby  the 
values  of  f  nay  be  computed  using  Z.  We  have  in  fact,  f(o>)  =  Z*o>,  where 

(6.6)  z-o)  =  (♦£)/♦, 

*=  multiply  the  polyna  .ial  matrix  t Z  consisting  of 
the  numerators  of  Z  with  <13,  reduce  to  minimal- 
degree  polynomials  modulo  if  and  then  divide 
formally  by  t  as  in  Method  1  above. 

We  can  also  compute  the  entire  output  of  the  system  (that  is, 
all  output  values  following  the  application  of  the  first  nonzero  input 
value)  by  the  rule 

(6.7)  Zm  =  (+Zo))/*, 

*=  same  as  above,  but  do  not  reduce  modulo 

In  this  second  case,  the  output  sequence  will  begin  with  a  positive 
power  of  z„  (The  coefficients  of  the  positive  powers  of  z  are 
thrown  away  in  the  definition  of  f  (see  (3*?))  and  in  the  definition 
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of  the  scalar  product  in  V,  in  order  to  secure  a  simple  formula 
for  =*  fl/kernel  f.) 

Many  other  applications  of  transfer  functions  may  be  found  in 
KALMAN,  FALB,  and  ARBIB  [1969,  Chapter  10,  Section  10]. 

It  is  easy  to  show  that  the  transfer  function  associated  with 
the  syctem  Zf  *  (F,  G,  H)  is  given  by  Zf  =  H(zl  -  ?)~1G.  (This  is 
just  the  formal  Laplace  transform  computed  from  the  constant  version 
of  (1.12)  by  setting  z  =  d/dt  or  from  (l.lT)  by  setting 
x(t  +  l)  »  zx(t).)  Probably  the  simplest  way  of  computing  Z  is 
via  the  formula 

6.8)  (zl  -  F)-1  =  z^^FKz),  q.  =  deg 

where  is  the  minimal  polynomial  of  the  matrix  I  and  the  super - 

X 

script  denotes  the  special  polynomials  defined  in  (5.5).  The  matrix 
identity  (6.8)  follows  at  once  from  the  classical  scalar  identity 
[WEB2R,  1898,  §4] 

tt(z)  -  ir(w)  =■  (z  -  w)  ^  zMq‘^(w)>  <1  =  deg  t r, 

upon  setting  w  »  F,  tt  =  tp,  and  invoking  the  Cayley-Hamilton  theorem. 

Much  of  classical  linear  system  theory  was  concerned  with  computing 
Z In  the  modern  context,  this  problem  ’'factors'*  into  first  solving 
the  realization  problem  t  -*  and  then  applying  formula  (6.8) .  See 
Sections  8  and  9* 

One  of  the  mysterious  features  of  Rule  (6.6)  (as  contrasted  with 
the  conventional  rule  (6.7))  is  the  necessity  of  reducing  modulo 
The  simplest  way  of  understanding  the  importance  of  this 
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aspect  of  the  problem  is  to  show  how  to  relate  the  module  invariant 
factors  occuring  in  the  structure  theorem  (4.34)  to  the  classical 
facts  concerning  the  invariant  factors  of  a  polynomial  matrix. 

(6.9)  INVARIANT  FACTOR  THEOREM  FOR  MATRICES.  Let  P  be  a  p  X  m 
matrix  with  elements  in  an  arbitrary  principal- ideal  domain  R.  Then 

(6.10)  P  *  AUB, 

where  A  and  B  are  p  X  p  and  m  X  m  matrices  (not  necessarily 
unique)  with  elements  In  R  and  det  A,  det  B  units  in  R,  while 

(6.11)  H  *  diag  (X^  ...,  0,  ...,  0)  with  €  R 

is  unique  (up  to  units  in  R)  with  \i  ^5+3/  i  “  1*  •••>  q.  -  1,  and 
q  *  rank  P.  The  are  called  the  invariant  factors  of  P. 

As  anyone  wculd  expect,  there  is  a  correspondence  between  the 
module  structure  theorem  (4.34)  and  the  matrix  structure  theorem  (6.9) 
and,  in  particular,  between  the  respective  invariant  factors  . ..,  ■ 
and  Aj,  ...,  A^.  Let  us  sketch  the  standard  proof  of  this  fact  follow¬ 
ing  CURTIS , and  REINER  [1962,  §13*3)  who  also  give  a  proof  of  (6.9). 

PROOF  OF  (4.34).  Consider  the  R-homomorphism  from  Rm 
onto  M  given  by  [i;  h  g^,  where  the  e^  are  the  standard 
basis  elements  of  (recall  (4.6))  and  the  gi  generate  M. 

Clearly,  M  w  Rs/N,  where  N  =  kernel  fi.  It  can  be  proved  that 
N  WR^  is  a  free  submodule  of  Rm,  with  a  basis  of  at  most  l  <  m 
elements.  Write  each  basis  element  f^  of  N  as  E  p^*e^,  €.  R. 
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Apply  (6.9)  to  the  F.-matrix  P.  Define  f  =  I  c^-f^  c  =  B"1, 
%  =  Z  V  V  **  (6-10-U),  fk  =  A1.Jl.  Hence 


H  =  ?^R  ©  . . ,  ©  A^r. 

Then,  by  "direct  sum", 

M  -  S/*K  e  . . .  ©  B/^B  ®  R^,  i  = 

That  is,  (4.34)  holds  with  ^  =  \  and  r  =  rank  P  =  t .  □ 

By  the  same  type  of  calculations,  we  can  prove  also 

(6.12)  THEOREM.  Let  A^,  A^  be__the  invariant  factors  of 

W  (6'9)'  <V  -  0i>  1  =  1 . .  Then  the 

invariant  factors  of  ^  are  ' 

♦i  -  t, 

^2  3  */*? 


*r  -  */0^ 

i^ere  r  Is. the  smallest  integer  such  that  f  j  ^  for 
i  ®  r  +  1,  q  »  rank  tZ . 

PROOF.  Consider  the  l:[z]-epimorphism  u:  fi  ^  [a>3 

Clearly,  «>  £  [0]z  -  kernel  „  iff  z-*  =  0  (see  (6.6)).  Equivalent^, 
(nh  =  °  U°d  r)>  Using  the  ^presentation  whose  existence  is  daisied 
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by  (6*9)>  write  \|fZ  =  CAD  (C,  A,  D  =  matrices  over  K[z].)  Define 
W  =  D  '*■'!?,  where 

?  =  diag  (fy  ...,  tr»  1,  1)- 

Then  AT  =  0,  (tZ)W  =  0,  and  W  has  clearly  maximal  rank  among  K[z]- 
mstrices  with  this  property.  So  the  columns  of  the  matrix  W  consti¬ 
tute  a  basis  for  kernel  n.  The  rest  follows  easily,  as  in  the  proof 
of  (4.34) •  □ 

(6.13)  REMARK.  The  preceding  proof  remains  correct,  without  any 
modification,  if  the  representation  ^Z  s  CAD,  det  C,  det  D  =  units 
is  taken  in  the  ring  K[z]/yK[z],  rather  than  in  K[z).  The  former 
representation  follows  trivially  from  the  latter  but  may  be  easier  to 
compute. 

(6. 14)  REMARK.  Theorem  (6.12)  shows  how  to  compute  the  invariant 
factors  of  X^  from  those  of  tyZ.  We  must  define  the  invariant 
factors  of  Z  to  be  the  same  as  those  of  Xz  (because  of  the 
bijective  correspondence  Z  X^  Consistency  with  (6.12)  demands 
that  we  write 

(6.15)  =  (\/e±)/Wz±),  V  = 

where  /  is  defined  as  in  (6.3).  In  other  words,  the  are 

the  denominators  of  the  scalar  transfer  function  after  cancellation 

of  all  common  factors. 

Theorems  (4. 34)  and  (6.12)  do  not  fully  reveal  the  significance 
of  invariant  factors  in  dynamical  systems.  Nor  is  it  convenient  to 
deduce  all  properties  of  matrix- invariant  factors  from  the  representation 
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theorem  (6.9)*  It  is  interesting  that  the  sharpened  results  we  present 
below  are  much  in  the  spirit  of  the  original  work  of  WEIERSTRASS,  H.  J.  S. 
SMITH,  KRONECKER,  FROBENIUS,  and  HENSEL,  as  summarized  in  the  well-known 
monograph  of  MUTH  [1899] • 

(6.16)  DEFINITION.  Let  A,  B  rectangular  matrices  over  a  unique  fact¬ 
orization  domain  P*  A|B  (read:  A  divides  B)  iff  there  are  matrices 
V,  W  (over  R,  of  appropriate  sizes)  such  that  B  =  VAW. 

This  is  of  course  just  the  usual  definition  of  "divide"  in  a  ring, 
specialized  to  the  noncommutative  ring  of  matrices. 

The  following  result  [MUTH  1899*  Theorems  Illa-b,  p.  52]  shows 
that  in  case  of  principal- ideal  domains  the  correspondence  between 
matrices  and  their  invariant  factors  preserves  the  divide  relation 
(is  "functorial"  with  respect  to  "divide"): 

(6.17)  THEOREM.  Let  R  be  a  principal- ideal  domain.  Then  A|  B 
if  and  only  if  Ai(A)|\.(B)  for  all  .i, 

PROOF.  Sufficiency.  Write  the  representation  (6.10)  as 

a  -  b  =  yv,w2. 

By  hypothesis,  there  is  a  ^  (diagonal)  such  that  =  A^.  Hence 

.  B  ■  V2V3W2’ 

=  v2r^vlW^H?, 

=  (V2V£l)A(W^Y2)  • 
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Necessity.  This  is  just  the  following 


(6.l8)  LEMMA..  For  an  arbitrary  unii 
domain  R,  A|B  implies  A^(A)|A^(B). 


iue- faci 


PROOF.  By  elementary  determinant  manipulations,  as  in 
MUTH  [1899,  Theorem  II,  p.  l6-l?].  D 

This  completes  the  proof  of  Theorem  (6.17)  □ 

(6.19)  REMARK.  Since  (6.9)  does  not  apply  (why?)  to  unique  factori¬ 
zation  domains,  for  purposes  of  using  Lemma  (6.23)  we  need  WEIERSTRASS 1 s 
definition  of  invariant  factors:  if  (A)  =  greatest  common  factor  of 
all  J  X  j  minors  of  a  matrix  A,  with  Aq(A)  =  1,  then 
7^(A)  «  A^(A)/A^^(A) .  Of  course,  this  definition  can  he  shown  to  he 
equivalent  (over  principal- ideal  domains)  to  that  implied  hy  (6.9)* 

In  analogy  with  Definition  (6,16),  let  us  agree  (note  inversionl)  on 


(6.20)  DEFINITION.  Let  Z 2  he  transfer- function  matrices 

Z^|Zg  (read:  Z^  divides  Z^)  iff  there  are  matrices  V,  W  over  K[z) 

such  that  Z„  =  VZ^W.  (Note  that  Z, Iz^  implies  at  once:  .) 

J.  d  i  d 

(6.21)  THEOREM.  Z][|Z2  if  and  only  if  ^(Z^U^Z^)  for  all  i. 

PROOF.  This  is  the  natural  counterpart  of  Theorem  (6.l6), 
and  follows  from  it  hy  a  simple  calculation  using  the  definition  of 
^(Z)  given  by  (6.15).  □ 
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(6.22)  DEFINITION.  2^1 2^  (read:  2^  can  be  simulated  by 

iff  Xj,  (Xj,  ,  that  is,  iff  Xj,  is  isomorphic  to  a  submodule  of 

X_  [or  isomorphic  to  a  quotient  module  of  3L  ). 

^2  ^2 


This  definition  is  also  funetorially  related  to  the  definition 
of  "divide”  over  a  principal  ideal  domain  R  because  of  the  following 
standard  re  stilt: 


(6,23)  THEOREM.  Let  R  be  a  principal-ideal  domain  and  X,  Y 
R-mc^oles.  Then  Y  ic  (isomorphic)  to  a  submodule  or  quotient  module 


of  X 


i  =  1,  •••>  r(Y)  <  r(x). 


PROOF.  Sufficiency 


Take  both 


in  canonical 


form  (4.34),  with  x^,  Xr(x) 
and  y±,  ...,  yr^  (with  y\  =  0 
assignment  yi  k  (^(Xj/^Cy^x^ 


generating  the  cyclic  pieces  of  X, 
if  i  >  r(Y))  those  of  Y.  The 
defines  a  monomorphism  Y  X,  that 


is,  exhibits  Y  as  (isomorphic  to)  a  submodule  of  X.  Similarly,  the 


assignment  x^  w  y^  defines  an  epimorphism  X  Y  exhibiting  Y  as 
(isomorphic  to)  a  quotient  module  of  X. 

Necessity  (following  BOURBAXI  [Algebre,  Chapter  7  (2e  ed.). 
Section  4,  Exercise  8)).  Let  Y  be  a  submodule  of  X.  By  (4.34), 

X  w  L/N  where  L,  N  are  free  R-modules.  By  a  classical  isomorphism 
theorem,  Y  is  isomorphic  to  a  quotient  module  M/N,  where  L  3  M  D  N 
and  M  is  free  (since  submodules  of  a  free  module  are  free). 
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From  the  last  relation,  r(Y)  <  r(X).  Now  observe,  again  using  (4.3*0 
ihat,  for  any  R-module  X  and  any  tt  £  R, 

r(7«)<k  «=£>  Vk(X)l?r 

and  therefore 

=  ideal  generated  by  (ir:  r(7flt)  <  k) . 

Since  7*Y  is  a  submodule  of  7iX  for  all  ir  £  R,  it  follows  that 
RVk(X)  ^Rj|rk(Y),  and  the  proof  is  complete  for  the  case  when  Y  is 
a  submodule  of  X.  The  proof  of  the  other  case  is  similar.  □ 

(6.24)  COROLLARY.  i  ■  1,  ...,  r(Zz). 

HOOF.  Immediate  from  the  fact  that  is  a  submodule 

of  £  (see  Section  7)*  D 

Now  we  can  summarize  main  results  of  this  section  as  the 

(6.25)  '  PRIME  DECOMPOSITION  THEOREM  FOR  LINEAR  DYNAMICAL  SYSTEMS. 

The  following  conditions  are  equivalent: 

(i)  Zx  divides 

(ii)  ^(Zj)  divides  ^(Zg)  for  all  i. 

(iii)  £„  can  be  simulated  by  £„  . 

h  -  Z2 

PROOF.  This  follows  by  combining  Theorem  (6.21)  with  Theorem 
(6.23),  since  ^(Z)  =  ^y  definition.  □ 
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(6.26)  INTpiFRETAIION.  The  definition  of  Z^IZg  means,  in  system- 
theoretic  terms,  that  the  inputs  and  outputs  of  the  machine  ■whose  transfer 
function  is  Zg  are  to  he  "recoded":  the  original  input  <Ug  is  replaced  hy 
an  input  =  B(z)cd^  and  the  output  r2  is  replaced  hy  an  output 

r1  =  A(z)r2;  with  these  "coding"  operations,  Ig  will  act  like 
a  machine  with  transfer  function  Z^.  In  view  of  the  definition  of  a 
transfer  function,  the  equation  Z 1  =  AZgB  is  always  satisfied  whenever 
A,  B  are  replaced  hy  A,  B  (reduced  modulo  ).  This  means  that  the 
coding  operations  can  he  carried  out  physically  given  a  delay  of 
d  a  deg  units  of  time  (or  more).  No  feedback  is  involved  in  coding^ 
it  is  merely  necessary  to  store  the  d  la^c  elements  of  the  input  and 
output  sequences.  Hence,  in  view  of  The  em  (6.25)  and  Corollary  (6.24), 
we  can  say  that  it  is  possible  to  alter  the  dynamical  behavior  of  a 
system  Eg  arbitrarily  hy  external  coding  involving  delay  but  not 
feedback  if  and  only  if  the  invariant  factors  of  the  desired  external 
behavior  (Z^)  are  divisors  of  invariant  factors  of  the  external 
behavior  (Z^  )  of  the  given  system.  The  invariant  factors  may  be 
called  the  MIMES  of  linear  systems:  they  represent  the  atoms  of  system 
behavior  which  cannot  be  simulated  from  smaller  units  using  arbitrary 
but  feedback- free  coding.  In  fact,  there  is  a  close  (hot  not  isomorphic) 
relationship  between  the  Krohn-Khodes  primes  of  automata  theory  (see 
XAU4AN,  FALB,  and  ARBIB  [1969*  Chapters  ?-9l)  and  ours.  A  full  treat¬ 
ment  of  this  part  of  linear  system  theory  will  be  published  elsewhere. 
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7.  ABSTRACT  THEORY  OF  REALIZATIONS 


The  purpose  of  this  short  section  is  to  review  and  expand  those 
portions  of  the  previous  discussion  which  are  relevant  to  the  detailed 
theory  of  realizations  to  he  presented  in  Sections  8  and  9*  The  seme 
issues  are  examined  (from  a  different  point  of  view)  also  in  KALMAN, 

FALB,  and  ARBIB  [1969]. 

Let  f:  ft  T  he  a  fixed  input/output  map.  Let  us  recall  the 
construction  of  X^,  as  a  set  and  as  carrying  a  K[z] -module  structure 
(Sections  3  and  4).  It  is  clear  that  (i)  f  -  where 

ft  -♦  :  a)  [<a), 

if:  Xf  -*  r:  M  f(o>)  • 

are  K[z]-homomorphisms,  and  (ii)  \i^  -  epimorphism  while  -  mononorphism 
We  have  also  seen  that 


(7.1) 


(H*  •  epimorphism  <=>  Xf  is  completely  reachable; 

=  monomorphism  <==>  Xf  is  completely  observable. 


These  facts  set  up  a  "functor"  between  system- theoretic  notions  and 
algebra  which  characterize  Xf  uniquely.  Consequently,  it  is  desirable 
to  replace  also  our  system- theoretic  definition  of  a  realization  (3*12) 
by  a  purely  algebraic  one: 


(7.2)  DEFINITION.  A  realization  of  a  K[ z]-homomorphism  f:  M  ■*-»  T 
is  any  factorization  f  that  is,  any  commutative  disgram 


R.  E.  Kalman 


of  Kl z  ] ^hotn^nrtrphlsmg .  The  K[ z  ] -module  X  is  called  the  state 
module  of  the  realization.  A  realization  is  canonical  iff  It  Jj? 
completely  reachable  and  completely  observable,  that  is,  [x  is 
sur.jective  and  t  is  injective. 

A  realization  always  exists  because  we  can  take  X  =  ft,  n  =*  lfi, 
t  =  f  (or  X  *  T,  [X  •  f ,  i  *  1£). 

(7-3)  REMARK.  It  is  clear  that  a  realization  in  the  sense  of  (3.12) 
can  always  be  obtained  from  a  realization  given  by  (7.2).  In  fact, 
define  Z  =  (F,  G,  H)  by 

F:  X  ->  X:  xh  z-x, 

G  -  H  restricted  to  the  submodule  (ci>:  |co|  =  l) . 

H  =  t  followed  by  the  projection  r  k>  r(l). 

It  is  easily  verified  that  these  rules  will  define  a  system  with 
f,  x  »  f.  Given  any  such  Z,  it  is  also  clear  that  the  rules 

X  =  Xj, 

®  **  «&>  PzV(t)’ 

V:  xw  (H^x,  H^F^x,  ...  ) 

define  a  factorization  of  f.  Hence  the  correspondence  b etwee- 1  (3-12) 
and  (7.2)  is  bijective. 

The  quickest  way  to  exploit  the  algebraic  consequences  of 
definition  (7*2)  is  via  the  following  arrow- theoretic  fact: 


our 
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(7.4)  ZEIGER  FILL-IN  LEMMA..  Let  A,  B,  C,  D  be  sets  and  Of,  P,  T, 
and  6  set  maps  for  which  the  following  diagram  commutes: 


4 

r 


a 


- >  B 

X  I 

/ 


0 


z>  D 


If  a  is  surjective  and  B  is  infective,  there  exists  a  unique  set 
nap  9  corresponding  to  the  dashed  arrow  which  preserves  commutativity. 


This  follows  by  straightforward  "diagram-chasing1’,  which  proves 
at  the  same  time  the 


(7*5)  COROLLARY.  The  claim  of  the  lemma  remains  valid  if  "sets" 
are  replaced  by  "H-modules11  and  “set  maps1*  by  "R-homomorphisms". 

Applying  the  module  version  of  the  lemma  twice,  we  get 

(7.6)  PROPOSITION.  Consider  any  two  canonical  realizations  of  a 
fixed  f:  the  corresponding  state -sets  are  isomorphic  as  K[  z 3 -modules. 

Since  every  K[z] -module  is  automatically  also  a  K- vector  space,  (7*6) 
shows  that  the  two  state  sets  are  K-isomorphic,  that  is,  have  the  same 
dimension  as  vector  spaces.  The  fact  that  they  are  also  K[z] -isomorphic 
implies,  via  Theorem  (4.34),  that  they  have  the  same  invariant  factors. 

We  have  already  employed  the  convention  that  (in  view  of  the  bisection 
between  f  and  Ef),  the  invariant  factors  of  f  and  Xf  are  to  be 
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identified.  In  view  of  (7.6),  this  is  now  a  general  fact,  not  dependent 
on  the -Special  construction  used  to  get  X^.  We  can  therefore  restate 

(7.6)  as  the 

(7.7)  ISOMORPHISM  THEOREM  FOR  CANONICAL  REALIZATIONS.  Any  two 
canonical  realizations  of  a  ixxeo  i  have  isomorphic  state  modules. 

The  state  module  of  a  canonical  1 .  Mzatlon  is  uniquely  characterized 
(up  to  isomorphism)  by  its  invariant  factors,  which  may  be  also  viewed 
as  those  of  f. 

A  simple  exercise  proves  also 

(7.8)  PROPOSITION.  If  X  is  the  state  module  of  a  canonical 
realization  f ,  then  dim  X  (as  a  vector  space)  is  minimum  in  the 
class  of  all  realizations  of  f . 

This  result  has  been  used  in  some  of  the  literature  to  justify 
the  tei -dno logy  "minimal  realization"  as  equivalent  to  "canonical 
realization".  We  shall  see  in  Section  9  that  the  two  notions  are 
not  always  equivalent;  we  prefer  to  view  (7-2)  as  the  basic  defini¬ 
tion  and  (7*8)  as  a  derived  fact. 

(7.9)  REMARK.  Theorem  (7«7)  constitutes  a  proof  of  the  previously 
claimed  (4.24).  To  be  more  explicit:  if  Z=  (F,  G,  H)  and 

.A 

L  =  (F,  G,  H)  are  two  triples  of  matric  le fining  canonical  realiza¬ 

tions  of  the  same  f,  then  (7*7)  implies  the  existence  of  a  vector- 

A 

space  isomorphism  A:  X  -*  X  such  that 
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A  -1 
F  =  AFA  , 

(7- IQ)  G  =  AG, 

-1 

H  =  HA  . 


If  we  identify  X  and  X  then  A  is  simply  a  basis  change  and  it 
follows  that  the  class  of  all  matrix  triples  which  are  canonical 
realizations  of  a  fixed  f  is  isomorphic  with  the  general  linear 
group  over  X. 


The  actual  computation  of  a  canonical  realization,  that  is, 
of  the  abstract  Nerode  equivalence  classes  require  a  consider¬ 

able  amount  of  applied-mathematical  machinery,  which  will  be  developed 
in  the  next  section.  The  critical  hypthesis  is  the  existence  of  .. 
a  factorization  of  f  such  that  dim  X  <  «.  (this  is  sometimes 
expressed  by  saying  that  f  has  finite  rank.)  Given  any  such  reali¬ 
zation,  it  is  possible  to  obtain  a  canonical  one  by  a  process  of 
reduction.  More  precisely,  we  have 


(7.11)  THEOREM.  Every  realization  of  f  with  state  module  X 
contains  a  subquotient  (a  quotient  of  a  submodule,  or  equivalently, 
a  submodule  of  a  quotient)  X#  of  X  which  is  the  state-module  of 
a  canonical  realization  of  f  '. 


PROOF.  The  reachable  states  X^  =  image  \x  are  a  submodule 
of  X  and  so  are  the  unobservable  states  X^  =  kernel  t.  Hence 
X*  w  Xr/Xr  O  Xq  is  a  subquotient  of  X.  It  follows  immediately  that 
X^  is  a  canonical  state-module  for  f.  [The  proof  may  be  visualized 
via  the  following  commutative  diagram,  where  the  J  * s  and  p‘s  are 
canonical  injections  and  projections.]  □ 
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(7.12)  REMARK.  Since  any  subquotient  of  X  is  isomorphic  to  a 

submodule  (or  a  quotient  module)  of  X,  it  follows  from  Theorem  (6.23) 
that  X  can  be  state-state  module  of  a  realization  only  if  ^(f)!  ^(X) 
for  all  i  (recall  also  Corollary  (6.2*0).  This  condition,  however,  is 
not  enough  since  the  ^  are  invariants  of  module  isomorphisms  and  not 
isomorphisms  of  the  commutative  diagram  (7*2). 

The  preceding  discussion  should  be  kept  in  mind  to  gain  an  over¬ 
view  of  the  algorithms  to  be  developed  in  the  next  sections. 
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8.  CONSTRUCTION  OF  REALIZATIONS 

Now  we  shall  develop  and  generalize  the  basic  algorithm,  originally 
due  to  B.  L.  Ho  (see  HO  and  KALMAN  [1966]),  for  computing  a  canonical 
realization  £  =*  (F,  G,  H)  of  a  given  input /output  map  f.  Most  of 
the  discussion  will  be  in  the  language  of  matrix  algebra. 

Notations.  Here  and  in  Section  9  boldface  capital  letters*  will 
denote  block  matrices  or  sequences  of  matrices;  finite  block  matrices 
will  be  denoted  by  small  Greek  subscripts  on  boldface  capitals;  the 
elements  of  such  matrices  will  be  denoted  by  ordinary  capitals.  This 
is  intended  to  make  the  practical  aspects  of  the  computations  self- 
evident;  no  further  explanations  will  be  made. 

Let  f:  ft  -*  T  be  a  given,  fixed  K[z] -homomorphism.  Using  only 
the  K-linearity  of  f  we  have  that 

(8-1)  f ((d)(1)  »  £0  A-t+lu^t^ 

= 

where  the  A^  (k  >  0)  are  p  X  m  matrices  over  the  fixed  field  K. 

We  denote  the  totality  of  these  matrices  by 

A(f)  =  (A.^,  Ag,  ...  ). 

Then  it  is  clear  that  the  specification  of  a  K[ z] -homomorphism  f 
is  equivalent  to  the  specification  of  its  matrix  sequence  A(f).  More¬ 
over,  if  £  realizes  f  (8.1)  can  be  written  explicitly  as 

(8.2)  f(tu)(l)  =  ^  HI'-tGa>(t). 

S 

*Note  to  Printer:  Indicated  by  double  underline. 
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Comparing  (8.l)  and  (8.2)  we  can  translate  (3*12)  into  an  equivalent 
matrix-language 

(8.3)  DEFINITION.  A  dynamical  system  Z  «  (F,  G,  J0  realizes  a 
(matrix)  infinite  sequence  A  iff  the  relation 

Vi  =  HFk°'  k  “  °*  2'  ••• 

is  satisfied. 


Let  us  now  try  to  obtain  also  a  matrix  criterion  for  an  infinite 
sequence  A  to  have  a  finite-dimensional  realization.  The  simplest 
way  to  do  that  is  to  first  write  down  a  matrix  representation  for  the 
map  f:  ft  -4  P.  So  let 


"3 

*4 


*4 

*5 


and  verify  that  H(A(f))  represents  f  when  00  €  0  is  viewed  as  an 
*  column  vector  with  elements  (o)^(0),  ...,  0^(0),  ui^(l),  ...  ). 
Classically,  H(A)  is  known  as  the  (infinite)  Hankel  matrix  associated 
with  A.  We  denote  by  ^  the  n  X  V  block  submatrix  of  H  appear¬ 
ing  in'  the  upper  left-hand  corner  of  H. 

(8.4)  PROPOSITION.  •  Let  Z  be  any  realization  of  A.  Then 
rank  H  (A)  <  dim  Z  for  all  yi,  V  >  1. 
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(8.5)  COROLIARY.  An  infinite  sequence  A  has  a  finite -dimensional 
realization  only  if  rank  V(A)  is  constant  for  all  n,  v  sufficiently 


K.00F.  If  dim  £  =  ”,  the  claim  of  the  proposition  is 
vacuous  (although  formally  correctl).  Assume  therefore  that  din  Z  <  ® 

>m  £  the  finite  block  matrices 

-  la,  FG,  Fv_15] 

=  [H',  H'F',  H'(F')11*1]* 

O'R  =  H  (A) 

=U=V  TZ-tX}  \ 

by  the  definitic  ■  (8.3)  of  a  realization.  It  is  clear  that  rank  Rv 

and  rank  0  are  at  most  n  =  dim  £.  Thus  our  claim  is  reduced  to 
the  standard  matri;:  fact 

rank  (AB)  <  min  (rt-nk  A,  rank  B] .  □ 

Our  next  objective  is  the  proof  of  the  converse  of  the  corollary.  This  can  be 
done  in  several  ways.  The  original  proof  is  due  to  HO  and  KALMAN  [  1966 ]  5 
similar  results  were  obtained  independently  and  concurrently  by  YOULA 
and  TISSI  [1966]  as  well  as  by  SILVERMAN  [1966]*  Two  different  proofs 
are  analyzed  and  compared  in  KALMAN,  FALB,  and  ARBIB  [1969,  Chapter  10, 

Section  11].  All  proofs  depend  on  certain  finiteness  arguments.  We 
shall  give  here  a  variant  of  the  proof  developed  in  HO  and  KALMAN  [1969]* 


and  define  fr< 


5V 


and 


% 


Then 
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(8.6)  DEFINITION,  The  infinite  Hankel  matrix  H  associated  with 
the  sequence  A  has  finite  length  A  **  (A',  X”)  iff  one  of  the  follow¬ 
ing  tvo  equivalent  conditions  holds: 

V  «  min  {i»:  rank  H^,  *v  =rank  H^|+/c  v  for  all  k,v  =1,2,  ...  ) 
or 

A"  =  {min  l"i  rank  Hfl  *  rank  H,  -  for  all  k,  p  =  1,  2, 

=rj  *>  =p j  h  f/C  "  '  ■" "  r  r 

A*  is  the  row  length  of  H  and  A"  is  the  column  length  of  H. 

The  equivalence  of  the  two  conditions  is  immediate  from  the 
equality  of  the  row  rank  and  column  rank  of  a  finite  matrix.  The  proof 
of  the  following  result  (not  needed  in  the  sequel)  is  left  for  the  reader 
as  an  exercise  in  familiarizing  himself  with  the  special  pattern  of  the 
elements  of  a  Hankel  matrix: 

(8.7)  PROPOSITION.  For  any  H,  the  following  inequalities  are 
either  both  true  [ K  has  finite  length}  or  both  false  [otherwise ] : 


A' 

< 

rank 

Bin  A",  A" 

< 

mA" 

A" 

< 

rank 

3a«,pA‘ 

< 

pA1 

The  most  direct  consequence  of  the  finiteness  condition  given  by 
(8.6)  is  the  existence  of  a  finite-dimensional  representation  S  and 
Z  of  the  shift  operator  0^  acting  on  a  sequence  A.  The  "operand" 
will  be  the  Kankel  matrix  associated  with  a  given  A.  As  we  shall  see 
soon,  this  representation  of  the  shift  operator  induces  a  rule  for 


<  « 
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computing  the  matrix  F  of  a  realization  of  A.  This  is  exactly  ■what 
we  would  expect:  module  theory  tells  us  that,  loosely  speaking, 

§,  Z  «  aA  «  z  *  F. 

(8.7)  DEFINITION.  The  shift  operator  -j^  on  an  infinite  sequence 
A  is  given  hy 

°A:  V  ^  **  A2tk’  *•* 

the  corresponding  shift  operator  on  Hankel  matrices  is  then 

Ojj:  H(A)  m  H(cr*A). 

(Of  course,  orfi  is  well-defined  also  on  submatrices  of  a  Hankel  matrix.) 

(8.8)  MAEJ  LEMMA.  A  Hankel  matrix  H  associated  with  an  infinite 
sequence  A  has  finite  length  if  and  only  if  the  shift  operator 

has  finite -dimensional  left  and  right  matrix  representations.  Precisely: 
H  baa  finite  length  A  =  (A1,  A")  if  and  only  if  there  exist  /’  X 
and  I"  X  i"  block  matrices  S  and  Z  such  that 

(8.9)  =  S  H/S/„(A), 

-  S/.,iu(A)ik, 

and  furthermore  the  minimum  size  of  these  matrices  satisfying  (8.9)  _is 
A'XA'  and  A"  x  A". 

PROOF.  Sufficiency.  Take  any  i"  X  l"  block  matrix  Z 
which  satisfies  (8.9).  Compute  the  last  column  of  ^„Z: 
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(8.10)  Aj+/n+x  *  A$*iZiJ"  +  A3+2Z2£n  +  •••  +  A^+i"ZiMin 

for  all  J  =  0,  1,  ...  (where  Z^y  is  the  (u,  v)  element  block 
Of  Z).  Relation  (8.10)  proves  that 

rank  =  rank  for  all  k  =  0,  1,  ...  ; 

the  general  case  follows  by  repetition  of  the  same  argument.  Hence  the 
existence  of  the  claimed  Z  implies  that  the  column  length  X"  of  H 
cannot  exceed  the  size  of  Z.  If  actually  X"  is  smaller  than  the  size 
of  the  smallest  Z  which  works  in  (8-9)#  we  get  a  contradiction  from 
the  necessity  part  of  the  proof.  The  claims  concerning  S  are  proved 
by  a  strictly  dual  argument. 

Necessity.  By  the  definition  of  X",  each  column  of  the 

(X”  +  l)^1  block  column  of  H  v  >  is  linearly  dependent  on  the 

columns  of  the  preceding  block  columns  of  H  moreover,  this 

A  +X 

property  is  true  for  all  integers  *i,  no  matter  how  large.  So  there 
exist  m  X  m  matrices  Z^,  Z^„  such  that  the  relation 

(8.11)  Aj+1Zx„  +  Aj+2Zx„+1  +  •"  +  =  AJ+1+X" 

holds  identically  for  all  j  =  0,  1,  ...  .  Now  define  Z  to  he  an 
X"  X  X"  block  companion  matrix  of  m  x  m  block  made  up  from  the 
just  defined: 
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Then,  for  all  k  >  0,  computi 

=  °hSi,  a"Sa",i; 

the  second  step  uses  (8.9)*  By  definition  of  cr^  and  E,  the  last 
matrix  is  just  the  (l,  l)^  element  of  H(<j^(A)),  namely  A^+^» 
Hence  the  given  £  •  is  a  realization  of  A. 

Necessity*  This  is  immediate  from  Cor!  Ary  (8.5)*  rj 

Now  we  want  to  attack  the  problem  of  finding  a  canonical  realiza¬ 
tion  of  A,  since  the  realization  given  by  (8.13)  is  usually  very  far 
from  canonical.  Our  succeeding  considerations  here  and  in  Section  9 
are  made  more  transparent  if  we  digress  for  a  moment  to  establish 
another  consequence  of  (8.8). 

By  outrageous  abuse  of  language,  we  shall  say  that  A  has  finite 
length  iff  H(A)  has  finite  length.  We  note. 

(8.1^)  DEFINITION.  An  infinite  sequence  B  is  an  extension  of 

order  N  of  (the  initial  part  of)  an  infinite  sequence  A  iff 
\  ~  Bk  k  -  1,  ...,  N. 

(8.15)  THEOREM.  No  infinite  sequence  of  finite  length  (X1,  X”) 
has  distinct  length-preserving  extensions  of  any  order  N  >  X’  +  X". 


PROOF.  Suppose  B  is  a  length-preserving  extension  of  order 
N  of  A,  the  length  of  both  sequences  being  (X*,  XH)>  with  N  >  X*  +  Xw. 
By  (8.8),  both  sequences  satisfy  relation  (8.9),  with  suitable  S^  and  Z^. 
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The  sequence  A  is  uniquely  determined  by  acting  on  ^„(A) 

from  the  left  and  the  sequence  B  is  uniquely  determined  by  Z^ 
acting  on  the  matrix  H^,  ^„(b)  from  the  right.  The  two  matrices 
are  equal  by  hypothesis  on  IT.  Moreover, 

S&sa-^  “  “aSasa"^ 

and  .  . 

5v,Art^SB  *  <7b5a,,a,,^2) 

axe  also  equal,  since  the  matrices  on  the  right-hand  side  depend  only 
on  the  2nd,  5-th  member  of  each  sequence.  Using  only  this  fact 

and  the  associativity  of  the  matrix  product 


Sa'jA-^b  =  Iasa-Mb  » 
'  SaSasa^'1- 


So  A  =  B. 


- 

=  Ia2a',A"’ 


Now  we  can  hope  for  a  realization  algorithm  which  uses  only  the 
first.  A*  +  A"  terms  of  a  sequence  of  finite  length.  Ir.  **act,  we  have 


(8.16)  B.  L.  H0*s  REALIZATION  ALGORITHM.  Consider  any  infinite 
sequence  A  of  finite  length  with  as  jociated  Hahkel  matrix  H.  The 


I 


following  steps  will  lead  to  a  canonical  realization  of  A: 
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(i)  Determine  X1,  X". 

(ii)  Compute  n  =  rank  H^,  ^n;  in  doing  so,  determine 
nonsingular  pX*  X  pX*  and  mX"  X  mX"  matrices  P,  Q  such  that 


(8.17) 


*5xsx"* 


(iii)  Compute 


I  0 
n 


”F  - 
(8.18)  G  = 


H  =  , 


where  R^,  (f1  are  idempotent  "editing"  matrices  corresponding  to  the 
operations  "retain  only  the  first-  p  rows"  and  "retain  only  the  first 
m  columns" . 


We  claim  the 


(8.19)  REALIZATION  THEOREM  FOR  INFINITE  SEQUENCES.  .For  any  infinite 
sequence  A  whose  associated  Hankel  matrix  H  has  finite  length 
(X1,  X"),  B.  L.  Ho *5  formulas  (8.17-18)  yield  a  canonical  realization. 


PROOF.  If  E  defined  by  (8.17-18)  is  a  realization  of  A, 
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is,  HH*H  =  H.  Then,  by  definition  of  F,  G,  H,  and  H*, 

HFkG  «  (ipC)(RP[cr^3^)k(RPHC), 

=  RHCh^ci^D^C; 


by  repeated  application  of  (8.9), 

=  RSH^HZ)*^^, 


=  RS^KH^HC, 

=  HS^HC, 

=  =  ' 

=  B[a^]C. 


The  last 
first  m 


equation  calls  for  picking  out  the  first  p  rows  and  the 
columns  of  cr^H,  which  is  just  A^+^,  as 


□ 


(8.20)  COMMENT.  This  is  a  considerably  sharper  result  than  Theorem 
(8.12),  in  two  respects: 

(i)  It  is  no  longer  necessary  to  compute  Z:  rfe  simply 

use  the  matrix  H^,  ^„(o^A),  which  is  part  of  the  data  ol  the  problem. 

(ii)  Formulas  (8.l8)  give  the  desired  realization  in  minimal 
form:  there  is  no  need  to  reduce  (8.13)  to  a  minimal  realization  (recall 
here  (7.11)). 

Notice  also  that  the  proof  of  (8.19)  does  not  require  (8.12) 
but  depends  (just  like  the  latter)  on  direct  use  of  (8.8). 
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An  apparently  serious  limitation  of  the  algorithm  (8.l6)  is  the 
necessity  to  verify  abstractly  that  ”A  has  finite  length'*.  Of 
course,  this  can  be  done  only  on  the  basis  of  certain  special  hypotheses 
on  A,  given  in  advance.  (Examples:  (i)  A^  =  0  for  all  k  >  q; 

(ii)  A^  =  coefficients  of  the  Taylor  expansion  of  a  rational  function.) 
Fortunately,  the  difficulty  is  only  apparent,  for  the  preceding  develop¬ 
ments  can  be  sharpened  further: 

(8.21)  FUNDAMENTAL  THEOREM  OF  LINEAR  REALIZATION  THEORY.  Consider 
any  infinite  sequence  A  and  the  corresponding  Hankel  matrix  H. 

Suppose  there  exist  Integers  i*,  i”  such  that 

(8.22)  rack  H^^fA)  s  rank  Hif+i^„(A), 

=  rank5i.,i~+i^- 

A 

Then  there  exists  unique  extension  A  of  A  of  order  i1  +  V' 

such  that  Ai  <  £'  and  A!l  <  -2,r;  moreover,  applying  formulas  (8.17-18) 

A  A  - 

vith  A*  «  l' ,  A"  =  £"  gives  a  canonical  realization  of  A. 

PROOF.  Exactly  as  in  the  necessity  part  of  the  proof  of 
(8.8),  condition  (8.22)  implies  the  existence  of  S  and  Z  such  that 

(Q.23)  =  = 

A 

Define  an  extension  A  of  A  of  order  i*  ^  by 


k  >  1. 
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By  repeated  application  of  (8.23),  it  follows  that  we  have  also 

k  >  o. 


Now  it  is  clear,  from  (8.8),  that  Aa  <  i>  and  7^  <  l".  The  unique¬ 
ness  of  the  extension  follows  immediately  from  (8.15).  Moreover, 

Theorem  (8.19)  is  still  valid,  even  though  (^',  4")  is  not  necessarily 
minimal,  because  the  proof  of  (8.19)  depended  only  on  (8.9)  and  not  on 


the  minimality  of  (/*,  l"). 


□ 


Theorem  (8.21)  says,  in' effect,  that  a  canonical  realization  of 
some  extension  of  A  is  always  possible  as  scon  as  (8.22)  is  satisfied. 

SS 

Moreover,  (8.22)  can  be  used  as  a  practical  criterion  for  constructing 
by  trial  and  error  a  canonical  realization  of  any  A  known  to  have 
finite  length  (but  without  being  given  A1,  A"). 


(8.2*0  EXAMPLES.  • (i)  There  is  no  scalar  infinite  sequence  (p  =  m  = 
A  for  which  (8.22)  is  never  satisfied. 

(ii)  If  Hg,  is  square  and  has  full  rank  (for  instance, 
in  the  scalar  case),  then  (8.22)  is  automatically  satisfied. 

(iii)  If  the  algorithm  (8.l6)  is  applied  without  any  informa¬ 
tion  concerning  condition  (8.22),  the  system  £  defined  by  (8.l8)  will 
always  realize  some  extension  of  A,  at  least  of  order  1.  It  is  not 
known,  however,  how  to  get  a  simple  formula  which  would  determine  the 
maximal  order  of  this  extension  of  A. 


The  remaining  interesting  question  is  then:  What  can  be  said  if 
(8.22)  is  not  satisfied  for  a  finite  amount  of  data  A^,  ...,  and 


Ill 


R.  E.  Kalman 

any  i‘,  l"  satisfying  i*  +  J"  »  N.  This  problem  is  the  topic  of 
the  next  section. 

(8.25)  FINAL  COMMENT.  An  essential  feature  o  L  L.  Ho»s  algorithm 
is  that  is  preserves  the  block  structure  of  the  data  of  the  problem.  Of 
course,  one  ce.n  obtain  parallel  results  by  treating  Hgt  ^tl  as  an 
ordinary  matrix,  disregarding  its  block-Kankel  structure.  Such  a 
procedure  requires  looking  at  a  minor  of  H  of  maximum  rank,  and  vas 
described  explicitly  by  SILVERMAN  [1966]  and  SILVERMAN  and  MEADOWS  [1969]. 
There  does  not  seem  to  be  any  obvious  computational  advantage  associated 
with  the  second  method. 


am 
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9.  THEORY  OF  PARTIAL  REALIZATIONS 

In  one  obvious  respect  the  theory  of  realizations  developed 

in  the  previous  section  is  rather  unsatisfactory;  it  is  concerned 
with  infinite  sequences.  From  here  on  we  call  a  system  satisfying 

(8.3)  a  complete  realization,  to  distinguish  it  from  the  practically 

more  interesting  case  given  by 

(9.1)  DEFINITION.  Let  A  =  (A^  Ag,  ...  )  be  an  infinite 
sequence  of  p  x  m  matrices  over  a  fixed  field  K.  A  dynamical 
system  £  «  (F,  G,  H)  is  a  partial  realization  of  order  r  of 
A  iff 

Vi  =  for  k  =  0,  1,  . . . ,  r . 

We  shall  use  the  same  termindogy  if,  instead  of  an  infinite 

sequence  A,  we  are  given  merely  a  finite  sequence  A  =  (A  ,  ...,  A  ), 
3  —  S  -L  S 

s  >  r.  The  reason  for  this  convention  will  be  clear  from  the  dis- 

at 

cussion  to  follow.  We  shall  call  the  first  r  terms  of  A  a  partial 
sequence  (cf  order  r). 

The  concepts  of  canonical  partial  realization  and  minimal 
partial  realization  will  be  understood  in  exactly  the  same  sense  as  for 
a  complete  realization.  We  warn  the  reader,  however,  that  now  these 
two  notions  will  turn  out  to  be  inequivalent,  in  that 

minimal  partial  =>  canonical  partial 

but  not  conversely. 

Our  main  interest  will  be  to  determine  all  equivalence  classes 
of  minimal  partial  realizations;  in  general,  a  given  sequence  will. 
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have  infinitely  many  inequivalent  minimal  partial  realizations  if 
r  is  sufficiently  small. 

According  to  the  Main  Theorem  (8.21)  of  the  theory  of  realiza¬ 
tions,  the  minimal  partial  realization  problem  has  a  unique  solution 
whenever  the  rank  condition  (8.22)  is  satisfied.  If  the  length  r  of  the 
partial  sequence  is  prescribed  a  priori,  it  may  well  happen. that  (8.22) 
does  not  hold.  What  to  do?  Clearly,  if  we  have  a  minimal  partial 
realization  (F,  G,  H)  of  order  r  we  can  extend  the  partial 
sequence  of  A^  on  which  this  realization  is  based  to  an  infinite 
sequence  canonically  realized  by  (F,  G,  H)  simply  by  setting 

Ak  S  HF^G,  k  >  r. 

Consequently,  we  have  the  preliminary 

(9*2)  PROPOSITION.  The  determination  of  a  minimal  partial 
realization  for  A ^  is  equivalent  to  the  determination  of  all 
extensions  of  a  partial  sequence  such  that  the  extended 

sequence  is 

(i)  finite -dimensional  ^nd.  more  strongly, 

(ii)  its  dimension  is  minimal  in  the  class  of  all  extensions. 

It  is  trivial  to  prove  that  finite-dimensional  extensions  exist 
for  any  partial  sequence  (of  finite  length).  Hence  the  problem  is  immediately 
reduced  to  determining  extensions  which  have  minimal  dimension.  The 
solution  of  this  latter  problem  consists  of  two  steps.  First,  we  show 
by  a  trivial  argument  that  the  minimal  dimension  can  be  bounded  from 
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below  by  an  examination  of  the  Hankel  array  defined  by  the  partial 
sequence.  Second,  and  this  is  rather  surprising,  we  show  that  the 
lower  bound  can  be  actually  attained.  For  further  details,  especially 
the  characterization  of  equivalence  classes  of  the  minimal  partial 
realizations,  see  KALMAN  [1969c  and  1970b]. 

(9-3)  DEFINITION.  By  the  Hankel  array  H(Ar)  of  a  partial 

sequence  we  mean  that  r  X  r  block  Hankel  matrix  whose  (i, 

block  is  A,...  if  i  +  J  -  1  <  r  and  undefined  otherwise. 

. —  —  i+j-1  —  «  - - — . - -  - 

In  other  words,  the  Hankel  array  of  a  partial  sequence 

consists  of  block  rows  and  columns  made  up  of  subsequences 

A  ,  Aw  (l  <  p  <  r)  of  A  and  blank  spaces, 

p  r  e  *»  =r 

(9.4)  PROPOSITION.  Let  n  (A  )  be  the  number  of  rows  of  the 

o  s=r  ■ 

Hankel  array  of  which  are  linearly  independent  of  the  rows 

above  them. .  Then  the  dimension  of  a  realization  of  A  is  at  least 

“  . -  ■■  ■  ■  .  =r  . 

n  (A  ). 
o'«r' 

PROOF.  The  rank  of  any  Hankel  matrix  of  an  infinite 

sequence  A  is  a  lower  bound  on  the  dimension  of  any  realization 

of  A,  by  Proposition  (8.1*-  ).  By  Proposition  (9.2),  it  suffices 

to  consider  a  suitable  extension  A  of  A^.  This  implies  "filling 

in"  the  blank  spaces  in  the  Hankel  array  of  A^.  Regardless  of  how 

H(Af)  is  filled  in,  the  rank  of  the  resulting  r  X  r  block  Hankel 

matrix  is  bounded  from  below  by  n  (A  ).  □ 

*  o=r' 

By  the  block  symmetry  of  the  Hankel  matrix,  we  would  expect 
to  be  able  to  determine  ^(Ay)  by  an  analogous  examination  of  the 


B 
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columns  of  the  Hank el  array  of  A^,  thereby  obtaining  the  same 
lower  bound.  This  is  indeed  true.  We  prefer  not  to  give  a  direct 
proof,  since  the  result  will  follow  as  a  corollary  of  the  Main 
Theorem  (9*7). 

The  critical  fact  is  given  by  the 

(9*5)  MAIN  LEMMA.  For  a  partial  sequence  A^  define: 

X1  (A  )  =  smallest  integer  such  that  for  k*  >  X1  every 

row  of  H(Ar)  is  linearly  dependent  on  the 
rows  above  it. 

A” (Ay)  =  smallest  integer  such  that  for  k"  >  X"  every 
t*  Nunn  in  the  k-th  block  column  of  H(A  ) 

—  . . .  -  . . . .  b  '=r 

is  linearly  dependent  on  the  columns  to  the 

left  of  it. 

Every  partial  sequence  A^  may  be  extended  to  an  infinite 
sequence  A  in  at  least  one  way  such  that  the  condition 

(9.6)  rankH^v(A)  =  nQ(Ay)  for  all  4  >  X'(Ay),  v  >  X"(/r) 

is  satisfied. 

PROOF.  The  existence  of  the  numbers  X'.  X"  is  trivial. 

It  suffices  to  show,  for  arbitrary  r,  how  to  select  A^^  in 
such  a  way  that  the  numbers  X1,  X",  and  nQ  remain  constant. 

Consider  the  first  row  of  -and  examine  in  turn  all  the 

first  rows  of  the  first,  second,  third,  ...,  Xl-th  block  rows  in 
H(A^).  If  the  first  row  of  the  first  block  row  is  linearly  depen¬ 
dent  on  the  rows  above  it  (that  is,  0),  we  fill  in  the  first  row 
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of  A^^  using  this  linear  dependence  (that  is,  we  make  the  first 

row  of  A  ,,  all  zeros).  This  choice  of  the  first  row  of  A  , 
r+1  r+1 

will  preserve  linear  dependencies  for  the  first  row  of  every  block 
row  below  the  second  block  row,  by  the  definition  cf  the  Hankel 
pattern.  If  the  first  row  in  the  first  block  row  is  linearly 
independent  of  those  above  (that  is,  contributes  1  to  n0(Ar)), 
we  pass  to  the  second  block  row  and  repeat  the  procedure.  Eventually 
the  first  row  of  some  block  row  will  become  linearly  dependent  on 
those  above  it,  except  when  X‘  =  r;  in  that  case,  choose  the  first 
row  of  A^^  to  be  linearly  dependent  of  the  first  rows  of 

A^.  Repeating  this  process  for  the  second,  third,  ...  rows 
of  each  block  row*,  eventually  A^fl  is  determined  without  increas¬ 
ing  X*  or  nQ. 

To  complete  the  proof,  we  must  show  that  the  above  definition 
of  also  preserves  the  value  of  X".  That  is,  we  must  show 

that  no  new  independent  columns  are  produced  in  the  Hankel  array  of 
Ar  when  A^^  is  filled  in.  This  is  verified  immediately  by  noting 
that  the  definition  of  A^^  implies  the  conditions 

rankSr,l  =  rank  ^ 

iankSr.iJ2  =  rankir,2> 

•  •  • 

rank  H,  =  rank  H.  =  rank  H.  ,  _ .  □ 

=l,r  =2,  r  =l,r+l 


*0f  course,  now  linear  dependence  in  the  first  step  does  not 
imply. that-  the  corresponding  row  of  Ar+1  will  be  all  zeros. 
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With  the  aid  of  this  simple  hut  subtle  observation,  the  problem 
is  reduced  to  that  covered  by  the  Main  Theorem  (8.2l)  of  Section  8.  We  have: 


(9*7)  MAIN  THEOREM  FOR  MINIMAL  PARTIAL  REALIZATIONS.*  Let  £r 
be  a  partial  sequence.  Then: 


(i)  Every  minimal  realization  of  A  has  dimension  n  (A  ) . 

(ii)  All  minimal  realizations  may  be  determined  with  the  aid 
of  B.  L.  Ho's  formulas  (8.17-18)  with  A1  =  A*(A  )  and  A"  =  A"(Ar) 
as  ~iven  by  Lemma  (9-5)  • 

(iii)  If  r  >  A'(Ar)  +  A"(Ar)  then  the  minimal  realization 
is  unique.  Otherwise  there  are  as  many  minimal  realizations  as 
there  are  extensions  of  sat  j sfying  (9*6). 

PROOF.  By  the  Main  Lemma  (9-5),  every  partial  sequence  Af 
has  at  least  one  infinite  extension  which  preserves  A1,  A"  and 
nQ.  So  we  can  apply  the  (8.21)  of  the  preceding  section. 

It  follows  that  the  minimal  partial  realization  is  unique  if 
r  >  A* (Ay)  +  AM(Ar)  (the  A1  (A^J.4  A"(Ar)  +  1  Hankel  matrix  can  be 
filled  in  completely  with  the  available  data) ;  in  the  contrary  case,  the 
minimal  extensions  will  depend  on  the  manner  in  which  the  matrices 
A^^,  . ..,  A^,+^,  have  been  determined  (subject  to  the  requirement 
(9-6)).  □ 


In  view  of  the  theorem,  we  are  justified  in  calling  th  2  integer 

n  (A  )  the  dimension  of  A  . 

o  --r  - -  '"‘i " 


*A  similar  result  was  obtained  simultaneously  and  independently 
by  T.  Tether  (Stanford  dissertation,  1969) . 
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(9«8)  REMARK.  The  essential  point  is  that  the  quantities  n&, 

A',  and  Xu  are  uniquely  determined  already  from  partial  data, 
irrespective  of  the  possible  nonuniqueness  of  the  minimal  extensions 
of  the  partial  sequence.  We  warn,  however,  that  this  result  does 
not  generalize  to  all  invariants  of  the  minimal  realization.  For 
instance,  one  cannot  determine  from  how  many  cyclic  pieces  a 
minimal  realization  of  will  have:  some  minimal  realizations 
may  be  cyclic  and  others  may  not  [KALMAN  1970b]. 

Finally,  let  us  note  also  a  second  consequence  of  the  Main 
Theorem: 

(9*9)  CORQUARY.  Suppose  n^(Ar)  is  the  number  of  independent 
columns  of  the  Hankel  array  of  A^  (defined  analogously  with 
»„(*.))•  Then  dim  Af  «  n^Ap. 

PROOF.  Jf  n^(Ar)  >  nQ(Ar)  then,  using  the  Main  Theorem, 
we  get  a  contradiction  to  the  fact  that  the  rank  of  any  Hankel  matrix 
of  an  infinite  sequence  is  lower  bound  for  the  dimension  of  any  reali¬ 
zation  (Proposition  (B.  4)).  If  n^A^)  <  no(Af)  than  extending  Af 
to  any  A^|+^  we  contradict  the  fact  that  rank  H^,  is  at  least 
equal  to  n^A^).  □ 

In  other  wc-ds,  the  characteristic  property  of  rank,  that 
counting  rank  by  row  or  column  dependence  yields  identical  results, 
is  preserved  even  for  incomplete  Hankel  arrays. 

It  Is  useful  to  check  a  simple  case  which  illustrates  some  of 
the  technicalities  of  the  proof  of  the  Main  Lemma. 

(9*30/  EXAMPLE.  The  dimension  of  0,  0,  ...,  0,  A^)  Is  precisely 
r  X  p.  where  p  =  rank  A^  and  X'  =  X”  =  r. 
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10.  GENERAL  THEORY  OF  OBSERVABILITY 

In  this  concluding  section,  we  wish  to  discuss  the  problem  of 
observability  in  a  rather  general  setting:  we  will  not  assume 
linearity,  at  least  in  the  beginning.  This  is  an  ambitious  program 
and  leads  to  many  more  problems  than  results.  Still,  I  thinic  it  is 
interesting  to  give  some  indication  of  the  difficulties  which  are 
conceptual  as  well  as  mathematical.  This  discussion  can  also  - 
serve  as  an  introduction  to  very  recent  research  [KAIMAN  1969a* 
1970a]  on  the  observability  problem  in  certain  classes  of  nonlinear 
systems. 

The  motivation  for  this  section,  as  indeed  for  the  whole  theory 
of  observability,  stems  from  the  writer's  discovery  [KALMAN  1960a] 
that  the  problem  of  (linear)  statistical  prediction  and  filtering 
can  be  formulated  and  resolved  very  effectively  by  consistent  use 
of  dynamical  concepts  and  methods,  and  that  this  whole  theory  is  a 
strict  dual  of  the  theory  of  optimal  control  of  linear  systems  with 
quadratic  Lagrangian.  For  those  who  are  familiar  with  the  standard 
classical  theory  of  statistical  filtering  (see,  for  instance,  YAGLOM 
t 196a]),  we  can  summarize  the  situation  very  simply  by  saying  that 

Wiener -Kolmogorov  filter 

+  theory  of  finite-dimensional  linear  dynamical  systems 

=  Kalman  filter. 


For  the  latter,  the  original  papers  are  [KALMAN  1960a,  1963a]  and 
[  KALMAN  and  BUCY'  1961]. 


—  120  — 


R.  E.  Kalman 

The  reader  interested  in  further  details  and  a  modern  exposition  is 
referred  especially  to  the  monograph  of  KAIMW  [ 1969b}. 

We  shall  ex£jaine  here  only  one  aspect  of  this  theory  (which 
does  not  involve  any  stochastic  elements) :  the  strict  formulation 
of  the  "duality  principle"  between  reachability  and  observability. 

This  principle  was  formally  stated  for  the  first  time  by  KAIMAN  [196Cc],  but 
the  pertinent  discussion  in  this  paper  is  limited  to  the  linear  case  and 
is  somewhat  ad-hoc.  Aided  by  research  progress  since  i960,  it  is 
how  possible  to  develop  a  completely  general  approach  to  the  "duality 
principle".  We  shall  do  this  and,  as  a  by-product,  we  shall  obtain 
a  new  and  strictly  deductive  proof  of  the  principle  in  the  now 
classical  linear  case. 

We  shall  introduce  a  general  notion  of  the  "dual"  system,  and 
use  it  to  replace  the  problem  of  observability  by  an  equivalent 
problem  of  reachability.  In  keeping  with  the  point  ol*  view  of  the 
earlier  lectures,  we  shall  view  a  system  in  terms  of  its  input/output 
map  f  and  dualize  f  (rather  than  E).  The  constructibility 
problem  will  not  be  of  direct  interest,  since  its  theory  is  similar 
to  that  of  the  observability  problem. 

Let  ft,  f  be  the  same  sets  as  defined  in  Section  U  and  used 
from  then  on.  We  assume  that  both  ft  and  P  are  K- vector  spaces 
(K  =  arbitrary  field)  and  recall  the  definition  of  the  shift 
operators  and  on  ft  and  T  (see  (3.10)).  We  denote 

both  shift  operators  by  z  but  ignore,  until  later,  the  K[zj- 


module  structure  on  ft  and  P. 
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By  a  constant  (not  necessarily  linear)  input/output  map 
f:  (1  -»P  we  shall  mean  any  map  f  which  commutes  with  the  shift 
operators,  that  is, 

f  ( z  *m)  =  z  •  ( f  (cd)  ) . 

Let  us  now  formulate  the  general  problem  of  this  section: 

(10. l)  PROBLEM  OF  OBSERVABILITY.  Given  an  input/output  map  f, 
its  canonical  realization  £,  and  an  input  sequence  v  €  0  applied 
after  t  =  0.  Determine  the  state  x  of  £  at  t  =  0  from 
the  knowledge  of  the  output  sequence  of  £  after  t  =  0. 

This  problem  cannot  be  solvea  in  general!  To  see  this,  recall 
that  the  state  set  of  f  may  be  viewed  as  a  set  of  functions 

(f(a>o.)U):  G  -»KP:  VHfM(l)) 
since  <x>'  is  Nerode-equivalent  to  a)  iff 

f(a>'o-)(l)  =  f(a>«.)(l) 

Giving  v  G  n  and  the  corresponding  output  sequence  amounts  to 
giving  various  values  of  f(a>o*)(l)  (namely  those  corresponding 
to  the  sequences  0,  v^,  zv^  +  Vf_^,  ...,  v,  zv,  z  v,  ...),  and 
it  may  happen  that  these  substitutions  do  not  yield  enough  values  of 
the  function  f(u)o  •)(].)  to  determine  the  function  it  sell”.  This 
situation  has  been  recognized  for  a  long  time  in  automata  theory, 
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where,  in  an  almost  self-explanatory  terminology,  one  says  that 
"Z  is  initial-state  determinable  by  an  infinite  multiple  experiment 
(possibly  infinitely  many  different  v»s)  but  not  necessarily  by  a 
single  experiment  (single  v  chosen  at  will)."  See  MOORE  [ 195 6 ] . 

The  problem  is  further  complicated  by  the  fact  that  it  may  make  a 
difference  whether  or  not  we  have  a  free  choice  of  V.  KALMAN, 

FALB,  and  ARBIB  [1969,  Section  6.3)]  give  some  related  comments. 

A  further  difficulty  inherent  in  the  preceding  discussion  is 
that  the  problem  is  posed  on  a  purely  set- theoretic  level  and  does 
not  lend  itself  to  the  introduction  of  more  refined  structural 
assumptions.  We  shall  therefore  reformulate  the  problem  in  such 
a  way  as  to  focus  attention  on  determining  those  -properties  of  the 
initial  state  which  can  be  computed  from  the  combined  knowledge  of 
the  input  and  output  sequence  occurring  after  t  =  0. 

For  simplicity,  we  shall  fix  the  value  of  v  at  0  (no  loss  of 
generality,  since  f  is  not  linear).  Then  the  output  sequence 
resulting  from  x  after  t  =  0  is  given  simply  as  f(<n)>  where 
x  =  [co]f. 

We  shall  use  the  circumflex  to  denote  certain  classes  of 
functions  from  a  set  into  the  field  K.  For  the  moment,  this 
class  will  be  the  class  of  all  functions.  Thus 

I‘  =  (all  functions  P  -*K). 

An  element  r  of  ?  is  simply  a  "rule"  (in  practice,  a  computing 
algorithm)  which  assigns  to  each  possible  output  sequence  y  in  T 
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a  number  in  the  field  K.  If  r  resulted  from  the  state  x  = 
then 

t(t)  =  r(f(®))  -  (f. f)(«) 

A 

gives  the  value  of  a  certain  function  in  ft  and,  by  definition  of 

/V 

the  state,  also  the  value  of  a  certain  function  in  X.  This  suggests 
the 

(10,2)  DEFINITION.  An  element  x  G  i  is  an  observable  costate 
iff  there  is  a  r ^  G  P  such  that  we  have  identically  for  all 
0)  €  ft 

x([o>]f)  *  rs(f(»». 

In  other  words,  no  matter  what  the  initial  state  x  =  is, 

A 

the  value  of  x  at  x  can  always  be  determined  by  applying  the 
rule  T£  to  the  output  sequence  f(o))  resulting  from  x.  Note, 
carefully,  that  this  definition  subsumes  (i)  a  fixed  choice  of  the  . 
class  of  functions  denoted  by  the  circumflex,  and  (ii)  a  fixed  input 
sequence  after  t  =  0  (here  V  =  0).  For  certain  purposes,  it 
may  be  necessary  to  generalize  the  definition  in  various  ways 
(KAIMN  1970  a],  but  here  we  wish  to  avoid  all  unessential  complica¬ 
tions. 

According  to  Definition  (10.2),  we  shall  see  that  a  system  is 
completely  observable  iff  every  costate  is  observable.  This  agrees 
with  the  point  of  view  adopted  earlier  (see  Section  4)  in  an  ad-hoc 
fashion.  Alsc,  the  vague  requirement  to  "determine  x"  used  in 


I 
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(lO.l)  is  now  replaced  by  a  precise  notion  which  can  he  manipulated 
(via  the  actual  definition  of  the  circumflex)  to  express  limitations 
on  the  algorithms  that  we  may  apply  to  the  output  sequence  of  the 
system. 

The  requirement  "every  costate  is  observable "  can  be  often 
replaced  by  a  much  simpler  one.  For  instance,  if  X  is  a  vector 
space,  it  is  enough  to  know  that  "every  linear  co state  is  observable" 
or  even  just  that  "every  element  of  some  dual  basis  is  an  observable 
costate";  if  X  is  an  algebraic  variety,  it  is  natural  to  interpret 
"complete  observability"  as  "every  element  of  the  coordinate  ring  of 
X  is  an  observable  cost ate"  [KAIMAN  1970a j. 

We  can  now  carry  out  a  straightforward  "dualization"  of  the 
* 

setup  involved  in  the  definition  of  the  input/output  map  f:  n  — >  r. 
First,  we  adopt  (again  with  respect  to  a  fixed  interpretation  of  the 
circumflex) :  * 


(10.3)  DEFINITION.  The  dual  of  an  input/output  map  f :  ft  V 
is  the  map 

A  A  A  ^  A 

f:  T  y  »->y0f 

A 

Note  that  f  is  well-defined,  since  the  circumflex  means  the  class 
of  all  functions. 

As  to  the  next  step,  we  wish  to  prove  that  constancy  is  inherited 
under  dualization.  To  do  this,  w r.  have  to  induce  a  definition  of  the 

A  A 

shift  operator  on  r  and  a.  The  only  possible  definitions  are  the 
obvious  ones: 
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ap:  f  ->?:  ?w[(j£r:  r  »-»f(o'rr)]5 

eg  2  ft  -■»  ft:  to  ,  *  f  cr^o:  o>  i-»co(cr  o>)  ]. 

-  -Both  of  these  new  shift  operators  will  he  denoted  by  z’1. 
The  reason  for  this  notation  will  become  clear  later. 

Now  it  is  easy  to  verify: 


(10.1+)  PROPOSITION.  If  f  is  constant,  so  is  f. 

PROOF.  We  apply  the  definitions  in  suitable  sequence: 


-  r(z-f(o))) 

=  r(f(z.(o)) 

=  ?(r)(z-w) 


(def.  of  f), 
(def.  of  ct^), 

(f  is  constant), 
(def.  of  f), 
(def.  of  erg), 


and  so  we  see  that  f  commutes  with  z  whenever  f  docs. 


□ 


At  this  stage,  we  cannot  as  yet  view  f  as  the  input/output  map 

A 

of  a  dynamical  system  because  concatenation  is  not  yet  defined  on  T, 

A 

and  therefore  V  is  not  yet  a  properly  defined  “input  set". 

In  other  words,  it  is  necessary  to  check  that  the  notion  of  time  is 
also  inherited  under  dualj.zation.  In  general,  this  does  not  appear 

A 

to  be  possible  without  some  strong  limitation  on  the  class  r.  Here 
we  shall  look  only  at  the  simplest 
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(10.?)  HYPOTHESIS.  Every  function  r  in  f  satisfies  the 
finite ness  condition:  There  is  an  integer  |y|  (dependent  on  ?) 
such  that  for  all  y,  6  €  F  the  condition 

rk  -  k  *  1,  |r l 

implies 

r(r)  =  r(6). 

In  ether  words,  we  assume  that  the  value  of  each  f  at  r 
is  uniquely  determined  hy  some  finite  portion  of  the  output  sequence 

r. 

Assuming  (10.5),  it  is  immediate  that  P  admits  a  concatenation 
multiplication  which  corresponds  (at  least  intuitively)  to  the  usual 
one  defined  on  ft: 

(10.6)  y©§  = 

We  can  now  prove  the  expected  theorem,  vh5.ch  may  he  regarded 
as  the  precise  form  of  the  "duality"  principle: 

(10.7)  THEOREM.  Let  f  he  an  arbitrary  constant  lnprt /output 
map  and  f  its  dual.  Suppose  further  that  (10.?)  holds .  Then 
each  observable  co state  cf  f  (relative  to  P  satisfying  (10.?)) 
may  be  viewed  as  a  reachable  state  cf  f,  and  conversely. 

.  PROOF.  First  we  determine  the  Nerode  equivalence  classes  on 
T  induced  by  ?.  .  By  definition 

§  €  (r)?  iff  f (S.s)  =  ?(?.*) 
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for  all  e  £  T.  Now  f  is  linear  (1);  in  fact,  direct  use  of 
the  definition  of  f  and  (10.6)  gives 

5  £  (?)■£  iff  (?of)  (o>)  =  (5 of)  (to) ,  cu  £  ft. 

A.  A  ' 

So  rof  and  o0f  are  equal  as  elements  oi  X:  they  define  the 
same  observable  aostate.  Tn  fancier  language,  the  assignment 

(10.8)  d:  X-->Xf:  (r)->-*Vof 

is  well  defined  and  constitutes  a  bijection  between  the  reachable 

A 

states  of  f  and  those  costates  of  f  which  are  observable 
relative  to  the  function  class  □ 

Thus  (10.5)  is  a  sufficient,  condition  for  bhe  duality  principle 

A 

to  hold.  However,  the  fact  that  the  canonical  realization  of  f  is 
completely  reachable  is  not  quite  the  same  as  saying  that  the  canonical 
realization  of  f  is  completely  observable  because  the  latter  depends 
on  the  choice  of  V  and  therefore  is  not  an  intrinsic  property  of  f. 
Moreover,  Theorem  (10.7)  does  not  give  any  indication  how  ''big"  Xj  is 
and  it  may  certainly  happen  that  the  observability  problem  for  f  is 
much  more  difficult  than  the  reachability  problem.  These  matters  will 
be  illustrated  later  by  some  examples. 

Now  we  deduce  the  original  form  of  the  duality  principle  from 
Theorem  (IO.7).  The  essential  point  is  that  (10. 5)  holds  automati¬ 
cally  as  a  result  of  linearity. 

New  definition  of  the  function  class:  let  the  circumflex  denote 
the  class  of  all  K- linear  functions.  (All  the  underlying  sets  with  the 


K- vector  spaces,  so  the  definition  makes  sense.) 
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The  following  facts  are  well  known: 

(10.9)  PROPOSITION.  Let  *  denote  duality  in  the?  sense  of 
K-vector  spaces .  Then : 

f  £  (Kp[[z-1]])*  -  K^[ z”1], 

8  =  (*“[*])*  =  *“[[*]]. 

Now  we  can  state  the 

(10.10)  MAIN  THEOREM.  Suppose  f  i_s  K- linear,  constant,  finite - 
dimensional .  Suppose  further  that  *  means  K- linear  duality .  Then : 

(i)  f  i£  K-llnear  and  constant,  that  is,  a  K[ z"1  ] -homomorphism 
(and  therefore  written  as  f*)  and  finite -dimensional. 

(ii)  The  reachable  states  of  f*  are  isomorphic  with  the 
K- linear  dual  of  '  X^;  hence  every  co state  of  Xf  is  observable . 

PROOF.  The  fact  that  T  is  K- linear  implies,  by  (10.3), 

A 

that  f  is  K- linear;  the  constancy  of  f  always  implies  that  of 

A  A 

f,  by  Proposition  (10.4).  (Caution:  f  is  not  the  K[z]-linear 
dual  of  the  K[  z] -homomorphism  f,  and  the  construction  given  here 
cannot  be  simplified.  See  Remark  (4.26a).) 

To  prove  the  second  part,  we  note  that  by  Proposition  (10.9) 
Hypothesis  (10.5)  holds  and  thus  f  =  f*  is  a  well-defined  input/output 
map  of  a  dynamical  system.  We  must  prove  that  the  reachable  states 
of  f*  are  isomorphic  with  X£,  the  K-linear  dual  of  X^,.  This 
amounts  to  proving  that  the  K-vector  space  of  functions 


X  r(hf(x),  hf(z.x),  ...  ) 
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is  isomorphic  with  the  K- vector  space  X£.  It  suffices  to  prove 
that  the  K- vector  space  generated  by  the  K-linear  functions 

(10. 11)  {X:  x  [hf(zi*x)]j,  i  =  0,  1,  ...  and  j  *■  1,  ...,  m) 

is  isomorphic  with  X^.  Suppose  that,  for  fixed  x,  every  X(x)  *=  0. 
Then  x  =  0,  by  definition  of  the  Nerode  equivalence  relation  induced 
by  f  (recall  here  the  discussion  from  Section  3).  Since  X^.  is 
finite-dimensional  by  hypothesis,  it  follows  from  this  property  of 
the  functions  [X)  that  they  generate  X*.  Obviously,  dim  X*  =  dim  X^., 
so  that  everything  is  proved.  □ 

In  other  terms,  the  fact  that  f  =  K[  z  ]  -homomorphism  together 
with  the  appropriate  definition  of  A  implies  that 

f:  K^z'1]  -*  k“[[z]] 

is  a  K[z~^]-homomorphism.  Since  (10.5)  holds,  we  can  interpret 
/■» 

f  in  a  system- theoretic  way,  as  follows:  the  output  of  the  dual 
system  at  t  «  -  k  due  to  input  ?  is  given  by  the  assignment 

r  w  ?'r)(-  k), 

which  is  a  linear  function  defined  on  the  k-th  term  of  the  input 
sequence.  In  fact,  we  have 

r(r)  = 

=  (?of)  (<a)/ 

=  £(?(?)(-  k))(^). 
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( 10.12)  H3*IAHX.  It  is  essentially  a  consequence  of  Proposition  (10«9) 

A 

that  f  turns  out  to  be  the  sacs  kind  of  algebraic  object  as  f .  tfote, 
however,  that 

under  duality  the  input  and  output  terminals  are 
interchanged  and  t  is  replaced  by  -t  (hence  z 
by  z*1). 

In  terms  of  the  pictorial  definition  of  &  system,  this 
statement  simply  amounts  to  "reversing  the  directions  of  the  arrows", 
which  is  the  "right"  way  to  define  duality  in  the  most  general 
mathematical  context,  namely  in  category  theory.  We  would  expect 
that  the  duality  principles  of  system  theory  will  eventually  become 
a  part  of  this  very  general  duality  theory.  This  has  not  happened 
yet  because  the  correct  categories  to  be  considered  in  the  study  of 
dynamical  systems  have  not  yet  been  determined.  It  is  likely  that 
eventually  many  different  categories  will  have  to  be  looked  Ht  in 
studying  dynamical  problems. 

We  shall  now  present  an  example  which  should  help  to  interpret 
the  previous  results.  We  emphasize,  however,  that  the  theory  sketched 
here  is  still  in  a  very  rudimentary  form. 

(10.13)  EXAMPLE.  Consider  the  system  T,  defined  by 

x(t  +l)  =  2x(t)  +  u(t),  y(t)  =  x(t),  t  6  Zj 

.  ,0  if  0  <  x(t)  <  1/2, 

y(t)  -{ 

S.  if  1/2  <  x(t)  <  1, 
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with  X  =  U  =  Y  =  R  nod  1,  i.e.,  the  interval  [0,  l).  (l  is  to 
be  thought  of  as  identified  vith  0.)  We  let  u(t)  »  0,  We  view 
x  through  its  binary  representation 

CD 

X  *  5k(x)  »  0  or  1. 

It  is  clear  from  the  definition  of  the  system  that  the  output 
sequence  due  to  any  x  is  precisely 

If  x  is  irrational,  infinitely  many  terms  are  needed  to  identify 
it.  Consequently,  the  x‘s  are  isomorphic  with  the  Nerode  equiva¬ 
lence  classes  induced  by  f^,.  So  £  cannot  be  reduced. 

Relative  tc  =  functions'',  every.costate  of  f£  is 
observable,  provided  that  Hypothesis  (10.5)  is  not  satisfied.  If 
it  is,  then  only  those  costates  defined  on  fixed-length  rationals 
are  observable  (more  precisely,  these  are  functions  which  depend  only 
on  a  fixed  finite  subset  of  the  ik(x)ls).  Thus:  either  f  does 
not  define  a  dynamical  system  or  not  all  costates  are  observable. 

Now  let  us  replace  the  set  [0,  l)  by  its  intersection 
with  the  rationals.  It  is  clear  that  there  is  now  a  finite  algorithm 
for  determining  x:  we  -simply  apply  the  results  of  partial  realiza¬ 
tion  theory  of  the  previous  section.  (We  take  K  =  and  the 
problem  is  to  express  x  from  (|^(x),  ...,  §g(x)0  as  a  ratio 
of  polynomials  in  Zg[2] --which  is  always  possible  since  each  x 
is  rational.)  However,  x  is  not  "effectively  computable"  in  the 
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strict  sense  sine*  there  is  no  way  of  knowing  when  the  algorithm 

lias  stopped.  In  other  words,  given  an  arbitrary  no state  x  ‘there  exists 

no  fixed  rule  such  that  the  application  of  ?A  to  r  gives 

3C  X  X 

x(x)  for  all  x.  On  the  other  hand,  substituting  into  x  the 

results  of  the  partial-realization  algorithm  will  give  an  approxi¬ 
mation  to  the  value  of  x(x)  which  always  converges  in  a  finite 
(but  a  priori  unknown)  number  of  steps  as  more  values  of  the  output 
sequence  are  observed.  In  short,  the  co state -determination  algorithm 
has  certain  pseudo- random  elements  in  it  and  therefore  cannot  be 
described  through  the  machinery  of  deterministic  dynamical  systems. 

(Is  there  some  relation  here  to  the  conceptual  difficulties  of 
Quantum  Mechanics?) 
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11.  HISTORICAL  COMMENTS 

It  is  not  an  exaggeration  to  say  that  the  entire  theory  of  linear, 
constant  (and  here,  discrete -tin*)  dynamical  systems  can  he  viewed  as 
a  systematic  development  of  the  equivalent  algebraic  conditions  (2.8) 
and  (2.15) • 

Of  course,  the  use  of  modules  (over  K[z])  to  study  a  constant 
square  matrix  (see  (4.13))  has  been  "standard"  since  the  1920's  under 
the  influence  of  E.  NOETHER  and  especially  after  the  publication  of 
the  Modern  Algebra  of  VAN  DER  WAERDEN.  Condition  (2.15),  bv  itself, 
must  be  also  quite  old.  For  instance,  GAN1MAXHER  [1959,  Vol.  1,  p.  203] 
attributes  to  KRYLOV  [1931]  the  idea  of  confuting  the  characteristic 
polynomial  of  a  square  matrix  A  by  choosing  a  random  vector  b  and 
computing  successively  b,  Ab,  A*“b,  ...  until  linear  dependence  is 
obtained,  which  yields  the  coefficients  of  det  (zl  -  A).  (The  method 
will  sacceed  iff  is  cyclic  with  generator  g.)  However,  the 
merger  of  (4.13)  with  (2.15),  which  is  the  essential  idea  in  the  alge¬ 
braic  theory  of  linear  systems,  was  done  explicitly  first  in  KAIMAN  [1965b]. 

We  shall  direct  our  remarks  here  mainly  to  the  history  of  conditions 
(2.8)  and  (2.15)  as  related  to  controllability.  See  also  earlier- 
comments  in  KAIMAN  [1960c,  ppc  48l,  483,  484]  and  in  KAIMAN,  HO,  and 
NARENDRA  [1963,  pp.  210-212].  We  will  have  to  bear  in  mind  that  the 
development  of  modern  control  theory  cannot  be  separated  from  the  develop¬ 
ment  of  the  concept  of  controllability 5  moreover,  the  tecnnological 
problems  of  the  1950 »s  and  even  earlier  had  a  major  influence  on  the 
genesis  of  mathematical  ideas  (just  as  the  latter  have  led  to  many 
new  technological  applications  of  control  in  the  1960's). 
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The  writer  developed  the  mathematical  definition  of  controllability 
with  applications  to  control  theory,  during  the  first  part  of  1959- 
(Unpublished  course  notes  at  Johns  Hopkins  University,  1958/59-)  These 
first  definitions  were  in  the  form  of  (2.17)  and  (2.3)-  Formal  presenta¬ 
tions  of  the  results  were  made  in  Mexico  City  (September,  195 9^  see 
KAIMAN  [1960b]),  University  of  California  at  Berkeley  (April,  1969,  see 
KAIMAN  [I960d]),  and  Moskva  (June,  i960,  see  KALMAN  El960c3),  and  in 
scientific  lectures  on  many  other  concurrent  occasions  in  the  U.S.  As 
far  as  the  writer  is  aware,  a  conscious,  and  explicit  definition  of 
controllability  which  combines  a  control-theoretic  wording  with  a 
precise  mathematical  criterion  was  first  given  in  the  above  references. 

There  are  of  course  many  instances  of  similar  ideas  arising  in  related 
contexts.  Perhaps  the  comments  below  can  be  used  as  the  starting  point 
of  a  more  detailed  examination  of  the  situation  in  a  seminar  in  the 
history  of  ideas. 

The  following  is  the  chain  of  the  writer’s  own  ideas  culminating 
in  the  publications  mentioned  above: 

(l)  In  KAIMAN  E 195^1  it  is  pointed  out  (using  transform  methods) 
that  continuous-time  linear  systems  can  be  controlled  by  a  linear 
discrete-time  (sampled-data)  controller  in  finite  time.* 

*It  is  sometimes  claimed  in  the  mathematical  literature  of  optimal 
control  theory  that  this  cannot  be  done  with  a  linear  system.  This  is  false j 
the  correct  statement  is  "cannot  be  done  with  a  linear  controller  producing 
control  functions  which  are  continuous  (and  not  merely  piecewise  Continuous! ) 
in  time."  Such  a  restriction  is  ■  completely' irrelevant  from  the  technological 
point  of  view.  As  a  matter  of  fact,  computer- controlled  systems  have  been 
proposed  and  built  for  many  years  on  the  basis  of  linear,  time -optimal  control. 


135  — 


R.  E.  Kalman 


(2)  Transposing  the  result  of  KAIMAN  [1954]  from  transfer  functions 
to  state  variables,  an  algorithm  was  sketched  for  the  solution  of  the 
discrete-time  time-optimal  control  of  systems  with  bounded  control  and 
linear  continuous -time  dynamics.  [KAIMAN,  19571 

(3)  As  a  popularization  of  the  results  of  the  preceding  work,  the 
same  technique  was  applied  to  give  a  general  method  for  the  design  of 
linear  sampled-data  systems  by  KAIMAN  and  BERTRAM  [1958]. 

Some  background  comments  concerning  these  papers  are  appropriate: 

(l)  The  ideas  and  method  presented  in  KALMAN  [1954]  descend 
directly  from  earlier  (and  very  well  known)  engineering  research  on 
time-optimal  control.  (The  main  references  in  KAIMAN  [1954]  are: 

MCDONALD  [1950],  HOPKIN  [1951],  BOGNER  and  KAZDA  [1954],  as  well  as  a 
research  report  included  in  KAIMAN  [1955]*)  Although  the  results  of  . 
KALMAN  [1954]  on  15 near  time-optimal  control  were  considered  to  be  new 
when  published,  it  became  clear  later  that  similar  ideas  were  at  least 
implicit  in  OLDENBOURG  and  SARTORIUS  [1951,  §90,  p.  219]  and  in  TSYPKIN’s 
work  in  the  early  1950‘s.  The  engineering  idea  of  nonlinear  time-optimal 
control  goes  back,  at  least,  to  DOLL  [19^3]  and  to  OLDENBURGER  in  19^4, 
although  the  latter’s  work  was  unfortunately  not  widely  known  before  1957* 
During  the  same  time,  there  was  much  interest  in  the  same  problems  in 
other  countries;  see,  for  instance,  FELDBAUM  [1953]  and  UTTLEY  and  HAMMOND 
[1953]*  Mathematical  work  in  these  problems  probably  began  with  BUSHAW 1  s 
dissertation  [1952]  in  which,  to  quote  from  KALMAN  [1955,  before  equation 
(40)],  "  ...!,[ it  was]  rigorously  proved  that  the  intuition  which  led  to 
the  formulation  of  the  [engineering]  theory  [quoted  above]  was  indeed 
correct."  TSXEN’s  survey  [1954]  contains  a  lengthy  account  of  this  state 
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of  affairs  and  was  ready  by  many.-  We  emphasize:  none  of  this 
extensive  literature  contains  even  a  hint  of  the  algebraic  considerations 
related  to  controllability* 

(2-3)  The  critical  insight  gained  and  recorded  in  KAU4AN  [19571  is 
the  following:  the  solution  of  the  discrete- time  time-optimal  control 
problem  is  equivalent  to  expressing  the  state  as  a  linear  combination 
of  a  certain  vector  sequence  (related  to  control  and  dynamics)  with 
coefficients  bounded  by  1  in  absolute  value,  the  coefficients  being 
the  values  of  the  optimal  control  sequence.  The  linear  independence 
of  the  first  n  vectors  of  the  sequence  guarantees  that  every  point 
in  a  neighborhood  of  zero  can  be  moved  to  the  origin  in  at  most  n 
steps  (hence  the  terminology  of  "complete  controllability")}  and  the 
condition  for  this  is  identical  with  (2.17)  (stated  in  KALMAN  [1957] 
and  KAIMAN  and  BERTRAM  [1958]  only  for  the  case  det  F  ^  0  and  m  -  l). 

A  thorough  discussion  of  these  matters  is  found  in  KALMAN  [1960c;  see 

especially  Theorem  I,  p.  485].  A  serious  conceptual  error  in  KALMAN 

[1957]  occurred,  however,  in  that  complete  controllability  was  not 

assumed,  as  a  hypothesis  for  the  existence  of  time -optimal  control  law, 

but  an  attempt  was  made  to  show  that  the  controllability  is  almost 

always  complete  [Lemma  1].  In  fact,  this  lemma  is  true,  with  a  small 

technical  modification  in  the  condition.  Only  much  later  did  it  become 

clear  (see  the  discussion  of  Theorem  D  in  the  Introduction),  however, 

that  a  dynamical  system  is  always  completely:  controllable  (in  the  nonconstant 

case, • completely  reachable)  if  it  is  derived  from  an  external  description.  It  was 

this  difficulty,  very  mysterious  in  1957 j  which  led  to  the  development 
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of  a  formal  machinery  for  the  definition  of  controllability  during  the 
next  two  years.  The  changing  point  of  view  is  already  apparent  in 
KAIMAN  and  BERTRAM  11958];  the  unpublished  paper  promised  there  was 
delayed  precisely  because  the  algebraic  machinery  to  prove  Theorem  I) 
was  out  of  reach  in  1957-8.  Ccusult  axso  the  findings  of  the  biblio¬ 
grapher  RUDOLF  [1969]. 

IN  SUMMARY:  under  the  stimulation  of  the  engineering  problems 
of  minimal-time  optimal  control,  the  researches  begun  by  KAIMAN  [195^> 
3957]  and  KALMAN  and  BERTRAM  [1958]  eventually  evolved  into*  what  has 
come  to  be  celled  the  mathematical  theory  of  controllability  (of  linear 
systems) . 

Beginning  about  1955,  « nd  stimulated  by  the  same  engineering 
problems,  PONTRYAGINand  his  school  in  the  USSR  developed  th  dr 
mathematical  theory  of  optimal  control  around  the  celebrated  "Maximum 
Principle".  (They  were  well  aware  of  the  survey  of  TSIEN  [193*0 
mentioned  •  above,  and  referenced  it  both  in  English  and  in  the  Russian 
translation  of  1958.)  We  now  know  that  any  theory  of  control,  regard¬ 
less  of  its  particular  mathematical  style,  must  contain  ingredients 
related  to  controllability.  So  it  is  interesting  to  examine  how 
explicitly  the  controllability  condition  appears  in  the  work  of  PONTRYAGIN 
and  related  research. 

GAMKRELIDZE  [1957,  §2;  1958  §1,  §2]  calls  the  time  optimal  control 
problem  associated  with  the  system 


(ll.l)  dx/dt  «  Ax  +  bu(t) 
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"nonde generate"  iff  b  is  not  contained  in  a  proper  A- invariant 
subspace  of  Rn.  He  notes  immediately  that  this  is  equivalent  to 

(11.2)  det  (b,  Ab,  ...,  An-1b)  ^  0 

(i.e.,  the  special  case  of  (2.8)  for  m  »  l).  He  then  proves:  in 
the  "degenerate"  case  the  problem  either  reduces  to  a  simpler  one  or 
the  motion  cannot  be  influenced  by  the  control  function  u(*)*  All 
this  is  very  close  to  an  explicit  definition  of  controllability. 

However,  in  discussing  the  general  case  m  >  1,  GAMKRELIDZE  [1958, 

§3,  Section  1]  defines  "nondegeneracy 11  of  the  system 

(11.3)  dx/dt  =  Ax  +  Bu(t) 
as  the  condition 

(11.4)  det  (b  ,  Ab^  ...,  An~1bi)  ^  0  for  every  column  b^  £  B, 

but  he  does  not  show  that  this  jneralized  condition  of  " nonde generacy"  for  (11.3) 

inherits  the  interesting  characterization  proved  for  "nondegeneracy" 

in  the  case  of  (ll.l).  In  fact,  condition  (11.4)  is  much  too  strong 

to  prove  this;  the  correct  condition  is  (2.8),  that  is,  complete 

controllability.  In  other  words,  in  GAMKRELIDZE 1  s  work  (11.4)  plays 

the  role  of  a  technical  condition  for  eliminating  "degeneracy"  (actually, 

lack  of  uniqueness)  from  a  particular  optimal  control  problem  and  is 

not;  explicitly  related  to  the  more  basic  notion  of  conplete  controllability. 

Neither  GAMKRELIDZE  nor  PONTRYAGIN  [1958]  give  an  interpretation  of 

(11.4)  as  a  property  of  the  dynamical  system  (11.3),  but  employ  (11.4) 

only  in  relation  to  the  particular  problem  of  time -optimal  control.  See 
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also  KALMAN  [1960c,  p.  484].  A  similar  point  of  view  ie  taken  by 
LASALLE  [i960];  he  calls  a  dynamical  system  (11.3)  satisfying  (2.8) 

"proper "  but  then  gees  on  to  require  (llA)  (to  assure  the  uniqueness 
of  the  time-optimal  controls)  and  calls  such  systems  ’'normal”. 

The  assumption  of  some  kind  of  "nondegeneracy"  condition  was 
apparently  unavoidable  in  the  early  phases  of  research  on  the  time- 
optimal  control  problem.  For  example,  ROSE  [1953#  PP*  39-58]  examines 
this  problem  for  (11. l) 5  by  defining  "nondegeneracy"  [p.  4l]  by  a 
condition  equivalent  ot  (11.2),  he  obtains  most  of  GAMKRELIDZE 1 s  results 
in  the  special  case  when  A  has  real  eigenvalues  [Theorem  12].  ROSE 
uses  determinants  closely  related  to  the  now  familiar  lemmas  in  control¬ 
lability  theory  but  he,  too,  fails  to  formulate  controllability  as  a 
concept  independent  of  the  time-optimal  control  problem. 

A  similar  situation  exists  in  the  calculus  of  variations.  The 
so-called  Caratheodory  classes  (after  CARATHEODORY  [1933])  correspond 
to  a  kind  of  classification  of  controllability  properties  of  nonconstant 
systems.  In  fact,  the  standard  notion  of  a  normal  family  of  extremals 
of  the  calculus  of  variations  is  closely  related  to  condition  (ll.4), 
suitably  generalized  via  (2.5)  to  nonconstant  systems.*  Normality  is 
used  in  the  calculus  of  variations  mainly  as  a  'honde  gene  racy  condition. 

It  is  important  to  note  that  the  "nondegeneracy"  conditions 
employed  in  optimal  conurox  o,nd  the  calculus  01  variations  play  mainly  the 
role  of  eliminating  annoyin;  technicalities  and  simplifying  proofs. 

*The  use  of  the  word  "normal"  by  LaSALLE  [i960]  for  (ll.k)  is  cnly 
accidentally  coincident  with  the  earlier  use  of  the  "normal"  in  the 
calculus  of  variations. 
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With  suitable  formulation,  however,  the  basic  results  of  tiae-optiaal 
control  theory  continue  to  hold  without  the  assunption  of  complete 
controllability*  The  same  is  not  true,  however,  of  the  four  kinds  of 
theorems  mentioned  in  the  Into  rduet  ion,  and  therefore  these  results 
are  more  relevant  to  the  story  of  controllability  than  the  time-optimal 
control  discussed  above. 

There  is  a  considerable  body  of  literature  relevant  to  controllability 
theory  which  is  quite  independent  of  control  theory.  For  instance,  -he 
treatment  of  a  reachability  condition  in  partial  differential  equations 
goes  back  at  least  to  CH(W  [19^0]  but  perhaps  it  is  fairer  to  attribute 
it  to  Caratheodory’s  well-known  approach  to  entropy  via  the  nonintegra- 
biliiy  condition.  The  current  status  of  these  ideas  as  related  to 
controllability  is  reviewed  by  WEISS  [  1969,  Section  93  -  An  independent 
and  very  explicit  study  of  reachability  Is  due  to  RGXIN  [i960];  unfor¬ 
tunately,  his  examples  were  purely  geometric  and  therefore  the  paper 
did  not  help  in  clarifying  the  celebrated  condition  (2.8).  The 
Wronsklan  determinant  of  the  classical  tieory  of  ordinary  differential 
equations  with  variable  coefficients  also  has  intersections  with  control¬ 
lability  theory,  as  pointed  out  recently  with  considerable  success  by 
SILVERMAN  [1966].  Many  problems  in  control  theory  were  misunderstood 
or  even  incorrectly  solved  before  the  advent  of  controllability  theory. 

Some  of  these  are  mentioned  in  KALMAN  [1963b,  Section  93*  For  relations 
with  automata  theory,  see  ARBIB  [1965]. 

Let  us  conclude  by  stating  the  writer’s  ovr.  current  position  as 
to  the  significance  of  controllability  as  a  subject  in  mathematics: 
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(1)  Controllability  is  basically  an  algebraic  concept.  (This 
claim  applies  of  course  also  to  the  norlinear  controllability  results 
obtained  via  the  Pffcffian  method.) 

(2)  The  historical  development  of  controllability  was  heavily 
influenced  by  the  interest  prevailing  in  the  1950*8  in  optimal  control 
theory.  Ultimately,  however,  controllability  is  seen  as  a  relatively 
minor  component  of  that  theory. 

(3)  Controllability  as  a  conceptual  tool  is  indispensable  in 
the  discussion  of  the  relationship  between  transfer  functions  and 
differential  equations  and  in  questions  relating  to  the  four  theorems 
of  the  Introduction. 

(1)  The  chief  current  problem  in  controllability  theory  is  the 
extension  to  more  elaborate  algebraic  structures. 

For  a  survey  of  the  historical  background  of  observability, 
which  would  take  us  too  far  afield  here,  the  reader  should  consult 
KALMAN  [1969b]. 
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